hsgweon / pipits

Automated pipeline for analyses of fungal ITS from the Illumina
GNU General Public License v3.0
30 stars 16 forks source link

Error when running RDP classification: Error: None zero returncode: classifier -Xms4g -Xmx16g classify -t pipits_db/UNITE_retrained_27.10.2022/UNITE_retrained/rRNAClassifier.properties -o process_out/assigned_taxonomy_rdp_raw.txt process_out/intermed iate/input_nr_otus_nonchimeras_relabelled.fasta #57

Open menglemore opened 8 months ago

menglemore commented 8 months ago

Hi,

This is my first time using PIPITS. I had no issue with the tutorial on mock samples but during pipits_process for my actual samples, I run into this issue: Error: None zero returncode: classifier -Xms4g -Xmx16g classify -t pipits_db/UNITE_retrained_27.10.2022/UNITE_retrained/rRNAClassifier.properties -o process_out/assigned_taxonomy_rdp_raw.txt process_out/intermed iate/input_nr_otus_nonchimeras_relabelled.fasta.

Here are the results from the output.log.

2024-01-11 13:42:54 pipits_process started
2024-01-11 13:42:54 Downloading UNITE trained database, version: 27.10.2022
2024-01-11 13:48:38 ... Unpacking
2024-01-11 13:48:51 ... done
2024-01-11 13:48:51 Downloading database for SINTAX
2024-01-11 13:49:24 ... Unpacking
2024-01-11 13:49:25 ... done
2024-01-11 13:49:25 Downloading WARCUP trained database: 
2024-01-11 13:49:46 ... Unpacking
2024-01-11 13:49:47 ... done
2024-01-11 13:49:47 Downloading UCHIME database for chimera filtering: 
2024-01-11 13:49:55 ... Unpacking
2024-01-11 13:49:55 ... done
2024-01-11 13:49:55 Dereplicating and removing unique sequences prior to picking OTUs
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch

Dereplicating file funits_out/ITS.fasta 100%
882802030 nt in 3525785 seqs, min 100, max 482, avg 250
Sorting 100%
307813 unique sequences, avg cluster 11.5, median 1, max 146595
Writing FASTA output file 100%
2024-01-11 13:49:59 Picking OTUs [VSEARCH]
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch

Reading file process_out/intermediate/input_nr.fasta 100%
84061458 nt in 307813 seqs, min 100, max 482, avg 273
Masking 100%
Sorting by length 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 18750 Size min 1, max 11914, avg 16.4
Singletons: 13416, 4.4% of seqs, 71.6% of clusters
2024-01-11 13:50:33 Removing chimeras [VSEARCH]
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch

Reading file pipits_db/uchime_reference_dataset_28.06.2017/uchime_reference_dataset_28.06.2017.fasta 100%
16786547 nt in 30555 seqs, min 146, max 2570, avg 549
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Detecting chimeras 100%
Found 96 (0.5%) chimeras, 18638 (99.4%) non-chimeras,
and 16 (0.1%) borderline sequences in 18750 unique sequences.
Taking abundance information into account, this corresponds to
7322 (1.8%) chimeras, 402368 (98.2%) non-chimeras,
and 156 (0.0%) borderline sequences in 409846 total sequences.
2024-01-11 13:50:39 Renaming OTUs
2024-01-11 13:50:39 Mapping reads onto centroids [VSEARCH]
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch

Reading file process_out/intermediate/input_nr_otus_nonchimeras_relabelled.fasta 100%
4421074 nt in 18638 seqs, min 100, max 482, avg 237
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching unique query sequences: 3514479 of 3525785 (99.68%)
2024-01-11 13:54:55 Making OTU table
2024-01-11 13:55:04 Converting classic tabular OTU into a BIOM format [BIOM]
2024-01-11 13:55:10 Assigning taxonomy with VSEARCH-SINTAX [VSEARCH]
vsearch v2.26.1_linux_x86_64, 15.5GB RAM, 12 cores
https://github.com/torognes/vsearch

Reading file pipits_db/UNITE_retrained_27.10.2022.sintax.fa/UNITE_retrained_27.10.2022.sintax.fa 100%
189392914 nt in 326300 seqs, min 140, max 1501, avg 580
Counting k-mers 100%
Creating k-mer index 100%
Classifying sequences 100%
Classified 14934 of 18638 sequences (80.13%)
2024-01-11 13:56:57 Adding SINTAX assignment to OTU table [BIOM]
2024-01-11 13:56:58 Converting OTU table with taxa assignment into a BIOM format [BIOM]
2024-01-11 13:57:00 Phylotyping OTU table
2024-01-11 13:57:04 Assigning taxonomy with UNITE [RDP Classifier]
2024-01-11 16:16:51 Error: None zero returncode: classifier -Xms4g -Xmx16g classify -t pipits_db/UNITE_retrained_27.10.2022/UNITE_retrained/rRNAClassifier.properties -o process_out/assigned_taxonomy_rdp_raw.txt process_out/intermed
iate/input_nr_otus_nonchimeras_relabelled.fasta

I am unsure of what the issue is with the classifier? The SINTAX taxonomic classification did not appear to have issues. Thank you for your time.

hsgweon commented 8 months ago

@menglemore Can you post the command you used to run pipits_process?

menglemore commented 8 months ago

Hi @hsgweon here is the command I used:

pipits_process -i funits_out/ITS.fasta -o process_out -l /home/Documents/RawSequences/Biodeg/fungi/pipits/readpairslist.txt --includeuniqueseqs -r -t 8

hsgweon commented 8 months ago

I suspect that the issue is to do with the RAM (i.e. insufficient RAM). First of all, can you try to run with a smaller number of sequences to see if that works or not? So: head -n 100 funits_out/ITS.fasta > funits_out/ITS_subset.fasta

Then run pipits_process: pipits_process -i funits_out/ITS_subset.fasta -o process_out -l /home/Documents/RawSequences/Biodeg/fungi/pipits/readpairslist.txt --includeuniqueseqs -r -t 8

What's the RAM of your system? if you have more than 16GB, you could try increasing it i.e. if you have 32GB, then add "--Xmx 32g" to the command line, so it would be: pipits_process -i funits_out/ITS.fasta -o process_out -l /home/Documents/RawSequences/Biodeg/fungi/pipits/readpairslist.txt --includeuniqueseqs -r -t 8 --Xmx 32g

menglemore commented 8 months ago

Hi there, I also think it is insufficient RAM. I have 16GB but it appears to still not be enough? I used free -m to check and here are the results:

               total        used        free      shared  buff/cache   available
Mem:           15921        2103       10013           8        3804       13511
Swap:           2047        1278         769

I then tried the first approach by splitting the reads up into different subsets but used 1000 reads instead of 100 and still received the same error. Is this normal? Does this mean the RDP classifier will not work?

Thank you.

hsgweon commented 8 months ago

Did you say that it worked on the "mock samples"? If it worked with the test samples, then RDP classifier had no issue with loading the UNITE DB into the RAM. 16GB should be sufficient to run. Can you send me 100 sequences from your funits_out/ITS.fasta for me to test? You can send it to my email address (which you can find in the main github page).