Open gn01786955 opened 7 years ago
Hi @gn01786955, I see the "too many open files" error, but there seems to be an issue before that (which may be the root cause). After GASV clustering, the program breaks the clusters into independent subproblems which can be run in parallel. However, the lines below indicate that something is amiss:
SPLITTING CLUSTERS FILE FOR MBSV...
0 clusters and 0 fragments
Final Iteration: 0
There are 0 clusters listed. I see you have a .clusters file listed (though you call it tig105.m5 - is this a mistake?), but the support of each cluster (the second column) seems to be 0. This cannot be correct, since each cluster must have at least one read in it.
You said there were errors earlier - please provide these and I will take a closer look. To answer your questions at the bottom of the issue,
(1) Is my tig01.m5 file too much size ? Nope, it should work fine (though it may take a while to run)
(2) I split my tig01.m5 file and it became two files , Will it affect the SV results? Yes, since SVs take into account the number of reads that support them. We need to know the number of reads that support each SV, so you must provide the whole file. MBSV relies on breaking the data into independent subproblems, as indicated above, so this is where you will be able to see the performance improvement.
(3) How to know which one is an insertion or inversion in the result of tig105.m5 file ? The following information can be found in the GASV User Guide:
I haven't extensively looked for insertions: only deletions, inversions, and translocations.
@annaritz Why the I+ and I- have both StartLocRange and EndLocRange reported? My understanding is, for I+, only the StartLocRange should be meaningful; and for I-, only the EndLocRange is meaningful. Am I wrong about that?
I just got the Too many open files exception too. There is only one reference genome whose header is chr1
. The error is
RUNNING GASV
writing to output directory mbsv-RunGASV/
java -Xms2g -Xmx5g -jar /afs/nd.edu/user34/szhu3/loonlocal/openbiosrc/GASV/gasv/bin/GASV.jar
--cluster --batch --maximal --output regions --nohead --minClusterSize 1 --outputdir mbsv-RunGASV/ --verbose mbsv-RunGASV/gasv.in
Using window size of 43320
ClusterESP: processing chr 1, chr1
Exception in thread "main" java.io.FileNotFoundException: mbsv-RunGASV/binned-esps/intrachrom-longread_26025_0.0-1.0 (Too many open files)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at java.io.FileReader.<init>(FileReader.java:58)
at gasv.main.ReadESP.<init>(Unknown Source)
at gasv.main.ReadInput.createReadFiles(Unknown Source)
at gasv.main.ReadInput.readWindowFromFiles(Unknown Source)
at gasv.main.ClusterESP.clusterESP(Unknown Source)
at gasv.main.GASVMain.main(Unknown Source)
Final file is mbsv-RunGASV/gasv.in.clusters
@annaritz The way of alignment is my reference genome to raw read who made m5 file
my tig01.m5 message
my tig105.m5 message
I use to M5toMBSV to execute my tig105.m5 file and happen error , But other file executes success
my question is
(1) Is my tig01.m5 file too much size ? (2) I split my tig01.m5 file and it became two files , Will it affect the SV results? (3) How to know which one is an insertion or inversion in the result of tig105.m5 file ?
my result of tig105.m5 file
Thank you for your help