ncbi / fcs

Foreign Contamination Screening caller scripts and documentation
Other
88 stars 12 forks source link

[BUG]: Assertion failed: seen_queries.insert(next_seq_id).second #34

Closed ptrebert closed 1 year ago

ptrebert commented 1 year ago

Describe the bug run_fcsgx.py fails with the above message; I strongly assume that this is triggered if two sequences in the FASTA files have the same name.

To Reproduce Use same sequence names more than once in different input files.

Software versions (please complete the following information): See #31

Log Files Not needed.

Additional context This was triggered when using a manifest listing several FASTA files as input for a single run. Obviously, a name clash can be expected if several input datasets are created in the same way (here: same assembler following a single naming scheme).

A user-friendly solution would be to introduce another command line parameter that allows to supply a list of sample labels that are used in addition to the FASTA header to identify a sequence. I guess this would require more work on the internals of the FCS tools, so it should at least be made explicit in the wiki where the possibility of using a manifest file is explained.

Best, Peter

pstrope commented 1 year ago

Regarding the manifest file:

There will be a new version release soon.

thanks!