Closed nick-youngblut closed 3 years ago
I tried converting the full contig names:
NODE_18_length_62406_cov_15.570288
NODE_37_length_46852_cov_20.727739
NODE_157_length_24733_cov_33.082097
...
...to the truncated version as specified in README:
NODE_1,1
NODE_2,1
NODE_3,1
This seems to have worked. I'm guessing that graphbin2 automatically deals with the extra spades contig naming info in the contigs fasta and gfa (then why not also in the --binned
file)?
The output is also in the same truncated contig name format:
NODE_1,1
NODE_2,1
NODE_3,1
...which then affects downstream mapping of these nodeIDs to the contig fasta (eg., when using DAS-Tool).
If graphbin2 requires the truncated node naming, then it would be helpful if it wrote a new version of the contig fasta with truncated names.
Hello @nick-youngblut,
Thanks for posting this issue. As mentioned in the input format section, the current version of GraphBin2 requires the user to input truncated contig ids as shown. However, I agree with you that it is better to let users input the original contig ids rather than the truncated ones. I will update the code accordingly to take in and output the original contig ids. Until then, I will leave this issue open.
Fixed the contig naming issue for SPAdes version of GraphBin2. Now the user can input the original contig names provided by SPAdes in the initial binning result. GraphBin2 output will also contain the original contig names.
Commit ID: 0f6f5a4677c4f7fa5989f556966526473308ae0d
Closing issue after fixing.
I'm running graphbin2 with spades input and getting the following error:
I checked the code, and a realized that:
...is expecting a bin.csv file with contigs simply labeled as:
...but spades names contigs as:
So do the contig names in the output of spades (contig fasta & assembly graph) need to be changed from
NODE_\d+_length_\d+_cov\d+.\d+
toNODE_\d+
, or do the nodes just need to be changed in the--binned
input file?Why not just parse the entire, original contig name:
Also, a blanket
except:
with a generic error message and no traceback will make it hard for users to figure out what the problem is. Example from the code: