hildebra / lotus2

Amplicon sequencing pipelines suitable for SSU (16S, 18S), LSU (23S, 28S) and ITS.
http://lotus2.earlham.ac.uk/
GNU General Public License v3.0
52 stars 17 forks source link

custom DB causes ParseError thrown: Unexpected character #44

Closed we32i225 closed 10 months ago

we32i225 commented 10 months ago

Hi, very new to Lotus2 so this might be trivial. I have been trying to use the AF_full_region database produced by the Anaerobic Fungi Network (https://anaerobicfungi.org/databases/) but i got the error below

[M::main] CMD: /root/lotus2//bin//minimap2-2.17_x64-linux/minimap2 -x sr --sr -u both --secondary=no -N 30 -c -t 1 -o Will_AF/output_AFN/tmpFiles//otu_seeds.fna.phiX.0.cont_hit.paf /root/lotus2//DB//phiX.fasta Will_AF/output_AFN/tmpFiles//otu_seeds.fna [M::main] Real time: 0.005 sec; CPU: 0.004 sec; Peak RSS: 0.004 GB Loading Subject Sequences and Ids... ParseError thrown: Unexpected character '-' found. Make sure that the file is standards compliant. If you get an unexpected character warning make sure you have set the right program parameter (-p), i.e. Lambda expected nucleic acid alphabet, maybe the file was protein?

in response I substituted the '-' for '' thinking Lotus2 couldn't understand '-'. this still didn't work the same error appeared but with ParseError thrown: Unexpected character '' found.

is the problem the characters in my DB files? if so what character can i substitute with that Lotus2 can read. Many thanks :) (i cant submit the AF_full_region FASTA file to github apologies)

hildebra commented 10 months ago

Hey, no this wasn't the "-" char in the command args, probably relates to a char in the .fas file.. can you try to run lotus2 with the same parameters, but the std ITS database (UNITE)? If the error is not occurring, it's a problem with the new .fas file. Also note that you need a specific format for the .tax file you are submitting, please see here: https://github.com/hildebra/lotus2#coustom-reference-database best, Falk

we32i225 commented 10 months ago

Hi Falk thanks for getting back so quick! The sequencing is actually on the LSU region (unusual for fungi i know), i used the SLV LSU DB and it ran just fine, so i think it is my custom .fas and taxonomy file then. Just to clarify, can Lotus2 read '-' characters or will i have to substitute them, and does the FASTA file need to be formatted in a certain way as well, or just the taxonomy file?

Many thanks

hildebra commented 10 months ago

Hey Wei, I suspect that the error occurred inside the aligner used.. can you check in the log file with the commands called by Lotus2 (LotuSLogs/LotuS_commands.log (or similar, currently not on the cluster), and just try to rerun the last command there? As far as I remember, LotuS2 doesn't have specific formatting requirements for the fas&tax files, just matching headers of course (everything beyond a space character in the fas header will be ignored). best, Falk

hildebra commented 10 months ago

Hey Wei, I think the error occurred in the aligner, can you check the LotuSLogs/Lotus_cmd.log (or similar) file, and rerun the last logged command? There shouldn't be any specfici requirements for the formatting of the fasta and tax file, beyond matching labels. best, Falk

we32i225 commented 10 months ago

Hi Falk thanks for the help. I tried re-running the last command line, but that didn't fix it, i also tried using UNITE (which worked fine). I think it is likely my DB files are not formatted correctly, and not actually an issue with Lotus2, apologies. best Wei

hildebra commented 10 months ago

Sorry to hear Wei, it might be an unsupported character (eg "-") is being used in the DB fasta string. I hope you still find LotuS2 useful though. best, Falk