torognes / swarm

A robust and fast clustering method for amplicon-based studies
GNU Affero General Public License v3.0
121 stars 23 forks source link

Issues with plotting: OTU not found #167

Closed naurasd closed 2 years ago

naurasd commented 2 years ago

Hi,

I ran swarm and clustering with d = 13 (yes, very large, I know. I am trying a few parameters others have used. I am working with animal COI metabarcoding data with high intra-specific variability). Went smooth.

I would like to plot the 3rd OTU. I am running the following command (adjusted version from your paper's supplementary material Supp1.

python graph_plot.py -s statistics.txt -i internal.txt -o 3

The statistics.txt and the internal.txt files are the files that have been created for the -s and -i parameters when performing the initial clustering step with swarm. I am leaving out the -d parameters for now. As I can see, it defaults to zero when not defined.

However, I get this error message:

python graph_plot.py -s statistics.stats -i internal.struct -o 3 Error: OTU does not exists or contains only one element. Reading target OTU Parsing amplicon relationships

Why does the 3rd OTU not exist? I have more than 9,000 OTUs.

This is how my statistics.txt file looks (exemplary for the first rows). Ignore the bold font in the first row.

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

20 | 481765 | ASV1 | 362738 | 0 | 3 | 36 -- | -- | -- | -- | -- | -- | -- 16 | 386950 | ASV2 | 210150 | 0 | 1 | 12 168 | 476890 | ASV3 | 176472 | 0 | 5 | 55 11 | 145906 | ASV4 | 143517 | 0 | 1 | 6 35 | 244657 | ASV6 | 101936 | 0 | 2 | 20 7 | 187436 | ASV7 | 88833 | 0 | 1 | 13 This is how my internal.txt file looks like (exemplary for the first rows). Ignore the bold font in the first row. ASV1 | ASV20 | 2 | 1 | 1 -- | -- | -- | -- | -- ASV1 | ASV41 | 1 | 1 | 1 ASV1 | ASV71 | 1 | 1 | 1 ASV1 | ASV79 | 1 | 1 | 1 ASV1 | ASV477 | 1 | 1 | 1 ASV1 | ASV1299 | 1 | 1 | 1 ASV1 | ASV1985 | 2 | 1 | 1 I would appreciate your help. The -s and -i output files are written by swarm based on your algorithm, so I don't see why my OTU isn't found. The same problem occured when I told swarm to write the files as .stats and .struct files as in your code from the supplementary material. Thanks so much. Nauras
frederic-mahe commented 2 years ago

@naurasd thank you for trying swarm.

python graph_plot.py -s statistics.txt -i internal.txt -o 3

The statistics.txt and the internal.txt files are the files that have been created for the -s and -i parameters when performing the initial clustering step with swarm.

Yes, graph_plot.py --internal_structure internal.txt corresponds to swarm --internal-structure internal.txt, but graph_plot.py --swarms swarms.txt corresponds to swarm --output swarms.txt (i.e. swarm's default output), not to swarm --statistics-file stats.txt.

I realize now that the mixed option names are confusing (-s for graph_plot and -o for swarm). Sorry about that.

naurasd commented 2 years ago

hi @frederic-mahe

thanks for getting back to me about this. Will try again then.

However, seeing that you mention the plotting option in your paper and how I got it wrong trying to understand the procedure from the supplementary material, I think it would be necessary to add an explanation with an example to the github repository.

Cheers Nauras

frederic-mahe commented 2 years ago

Thanks for the suggestion.

I've added an example to the help message of the graph_plot.py script (commit f3a7c87cc56c28e8594be130a2e909c68c286d03).