torognes / swarm

A robust and fast clustering method for amplicon-based studies
GNU Affero General Public License v3.0
123 stars 23 forks source link

Error occur in amplicon_contingency_table.py #128

Closed debjitde01 closed 5 years ago

debjitde01 commented 5 years ago

Hello sir

I am facing a problem, I am trying to build a contingency table for amplicon with this script (python amplicon_contingency_table.py P11_dereplicated.fasta > amplicons_table.csv) and I got an error. Please help me to solve this error. I give the details of this error bellow.

pkd@pkd-HP-406-G1-MT:~/swarm-master/scripts$ python amplicon_contingency_table.py P11_dereplicated.fasta > amplicons_table.csv
Traceback (most recent call last):
  File "amplicon_contingency_table.py", line 101, in <module>
    main()
  File "amplicon_contingency_table.py", line 65, in main
    all_amplicons, amplicons2samples, samples = fasta_parse()
  File "amplicon_contingency_table.py", line 41, in fasta_parse
    amplicon, abundance = line.strip(">;\n").split(separator)
ValueError: need more than 1 value to unpack
frederic-mahe commented 5 years ago

hello @debjitde01

there seems to be a problem with you fasta file. Could you please show us a sample of your fasta file? the first few lines should be enough.

debjitde01 commented 5 years ago

Here is the first two sequences of my fasta file

5c7f43d336cef5e1c1d3122a9a827cf5a921b091_455 CCTACGGGTGGCTGCAGTCGAGAATCTTCCGCAATGGGCGAAAGCCTGACGGAGCGACGCCGCGTGACTGAAGAAGTCTTTCGGGACGTAAAGGTCTTTTATGAGGGAGAACATTTCGATAGTACCTCATGAATAAGGGGTTGCTAAACTCGTGCCAGCAGCAGCGGTAATACGAGTGCCCCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGTGTAGGCGGTCGCGTTAGTCATTCGTCAAAGCCTCCGGCTTAACCGGAGAATTGCGAATGAAACGGCGCGACTCGAGAGTGTGAGAGGTTTGCGGAACTCATGGTGTAGGGGTGAAATCCGTTGATATCATGGGGAACACCAAAAGCGAAGGCAGCAAACTGGCGCATTTCTGACGCTGAAACACGAAAGCGTAGGTAGCGAATGGGATTAGATACCCGAGTAGTC

f61538959e98c8dcc902e1e0eb094e38be627662_381 CCTACGGGTGGCTGCAGTCGAGAATCTTCCGCAATGGGCGAAAGCCTGACGGAGCGACGCCGCGTGACTGAAGAAGTCTTTCGGGACGTAAAGGTCTTTTATGAGGGAGAACATTTCGATAGTACCTCATGAATAAGGGGTTGCTAAACTCGTGCCAGCAGCAGCGGTAATACGAGTGCCCCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGTGTAGGCGGTCGCGTTAGTCATTCGTCAAAGCCTCCGGCTTAACCGGAGAATTGCGAATGAAACGGCGCGACTCGAGAGTGTGAGAGGTTTGCGGAACTCATGGTGTAGGGGTGAAATCCGTTGATATCATGGGGAACACCAAAAGCGAAGGCAGCAAACTGGCGCATTTCTGACGCTGAAACACGAAAGCGTAGGTAGCGAATGGGATTAGATACCCCAGTAGTC

05c0d2cf7f2cb6c2fd805e556b76a59e4e075f64_156 CCTACGGGTGGCAGCAGTAAGGAATATTGGACAATGGACGCAAGTCTGATCCAGCCATGCCGCGTGAAGGATTAAGGTCCTCTGGATTGTAAACTTCTTTTATTTGGGACGAAAAAAGATCATTCTTGATCACTTGACGGTACCAGATGAATAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTATCCGGATTCACTGGGTTTAAAGGGTGCGTAGGCGGGTTTGTAAGTCAGTGGTGAAATCTCGGAGCTTAACTCTGAAACTGCCATTGATACTATAAGTCTTGAATATTGCGGAGGTAAGCGGAATATGTCATGTAGCGGTGAAATGCTTAGATATGACATAGAACACCCATTGCGAAGGCAGCTTACTACACATATATTGACGCTGAGGCACGAAAGCGTGGGGATCAAACAGGATTAGATACCCTAGTAGTC

88002d0d75606500dc50440b631a37e8a5ee1518_156 CCTACGGGTGGCTGCAGTCGAGAATCTTCCACAATGGACGAAAGTCTGATGGAGCGACGCCGCGTGATTGATGAAGTCCCTCTGGGACGTAAAGATCTTTTATGAGGGAAGAAGTTTATTGACTGTACCTCATGAATAAGAGGCTCCTAATCTCGTGCCAGCAGGAGCGGTAATACGAGAGCCTCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGGTTGTTTTGTTAGTCTTTTGTCAAAGCCCCGAGCTTAACTTGGGAGAGGCGAAAGAAACGGCAAGACTTGAAAGTGCGAGAGGTATACGGAACTCATGGTGTAGGGGTGAAATCCGTTGATATCATGGGGAACACCAAATGCGAAGGCAGTATACTGGCGCATATTTGACACTGAAGCACGAAAGCGTGGGTAGCGAATGGGATTAGATACCCTAGTAGTC

450f29d88703903b912293cb3b17c3b4f572643c_117 CCTACGGGTGGCAGCAGTAAGGAATATTGGACAATGGACGCAAGTCTGATCCAGCCATGCCGCGTGAAGGATTAAGGTCCTCTGGATTGTAAACTTCTTTTATTTGGGACGAAAAAAGATCATTCTTGATCACTTGACGGTACCAGATGAATAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTATCCGGATTCACTGGGTTTAAAGGGTGCGTAGGCGGGTTTGTAAGTCAGTGGTGAAATCTCGGAGCTTAACTCTGAAACTGCCATTGATACTATAAGTCTTGAATATTGCGGAGGTAAGCGGAATATGTCATGTAGCGGTGAAATGCTTAGATATGACATAGAACACCCATTGCGAAGGCAGCTTACTACACATATATTGACGCTGAGGCACGAAAGCGTGGGGATCAAACAGGATTAGATACCCGTGTAGTC

b9ae0db8d6d66bd6ccaf303f3ad6f802777e72d7_115 CCTACGGGTGGCTGCAGTCGAGAATCTTCCACAATGGACGAAAGTCTGATGGAGCGACGCCGCGTGATTGATGAAGTCCCTCTGGGACGTAAAGATCTTTTATGAGGGAAGAAGTTTATTGACTGTACCTCATGAATAAGAGGCTCCTAATCTCGTGCCAGCAGGAGCGGTAATACGAGAGCCTCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGGTTGTTTTGTTAGTCTTTTGTCAAAGCCCCGAGCTTAACTTGGGAGAGGCGAAAGAAACGGCAAGACTTGAAAGTGCGAGAGGTATACGGAACTCATGGTGTAGGGGTGAAATCCGTTGATATCATGGGGAACACCAAATGCGAAGGCAGTATACTGGCGCATATTTGACACTGAAGCACGAAAGCGTGGGTAGCGAATGGGATTAGATACCCCGGTAGTC

There is > sign beginning of the the fasta sequences which is not shown here

frederic-mahe commented 5 years ago

@debjitde01 The script amplicon_contingency_table.py is expecting abundance values in usearch/vsearch format (;size=), whereas your fasta file uses _ to separate the abundance values. You can either edit your fasta file (sed -i '/^>/ s/_/;size=/'), or change the script (separator = ";size=" should be separator = "_").

But there might be a confusion here, the goal of the script amplicon_contingency_table.py is to parse many fasta files at once and to provide a contingency table listing each amplicon and the samples where they occur. That script does not produce an OTU contingency table.

If an OTU table if what you are interested in, you might want to follow the pipeline described on this page.

debjitde01 commented 5 years ago

Thank you. Now the problem is short out.

frederic-mahe commented 5 years ago

Good. I'll close this issue then. Feel free to open a new issue if need be.

debjitde01 commented 5 years ago

Sure and thank you so much for your help Debjit De

Research Scholar C/O - Dr. Paltu Kumar Dhal Assistant Professor Department of Life Science & Biotechnology Jadavpur University Kolkata - 700032.

On Thu, Feb 7, 2019 at 1:47 PM Frédéric Mahé notifications@github.com wrote:

Good. I'll close this issue then. Feel free to open a new issue if need be.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/torognes/swarm/issues/128#issuecomment-461324904, or mute the thread https://github.com/notifications/unsubscribe-auth/AbX7rz_rL4xyXBZg1ILEAn42YqqQdTUxks5vK-EIgaJpZM4afcjw .