Closed gabriellovate closed 3 years ago
Hi @gabriellovate ,
Thanks for raising this issue. We'll work on it! I guess that the read names and the names in your CSV file are mismatched. If there's any chance you can send me your files off-list (kuanhao.chao@gmail.com) that would really help. We'll only use them for fixing this bug!
Howard
I got the same error without just following the tutorial here https://sangeranalyser.readthedocs.io/en/latest/content/beginner.html#step-2-loading-and-analysing-your-data, but with my own sequence files
Could the error be related to samples not generating a contig (no data that passes QC) so that there are empty contigs created ?
I think it could well be. Howard - what we need are some QC steps that produce warnings when e.g. (i) any reads are left out of any contigs; (ii) any contigs end up with no reads at all; and anything else we can think users might want to be warned about when assembling all the data...
On Thu, 19 Nov 2020 at 19:03, Thomas Källman notifications@github.com wrote:
Could the error be related to samples not generating a contig (no data that passes QC) so that there are empty contigs created ?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/roblanf/sangeranalyseR/issues/53#issuecomment-730199875, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG2SEYOWJHOLBECLBJK6J3SQTGMBANCNFSM4OVAUQ2A .
-- Rob Lanfear Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra
www.robertlanfear.com
hi @thokall,
I am working on the QC steps now. If you can send me your dataset, I can check where the problem is for you first (kuanhao.chao@gmail.com). Thank you !
Howard
Hi,
My reasoning came from the fact that for some samples I only had forward data and a subset of these were of very low quality. If I just loop over all individual .ab1 files and extract fasta I do get a result from all of them, but in some cases it just a single bp left. I presume that this base is left in to avoid having empty outputs. For contigs I would expect a consensus read to contain the high-quality sequence found in any read even if there is no matching read in the other direction.
Eg:
Will share data via mail asap
Hi Thomas,
I think your reasoning is about right, and what the package does will depend on certain settings you have (e.g. how many reads need to overlap a position for that base to be included in the contig).
Once you send us the data we'll take a look and get back to you. I really appreciate your willingness to take the time to engage and help - it's the only way the package will improve!
Rob
On Tue, 24 Nov 2020 at 00:14, Thomas Källman notifications@github.com wrote:
Hi,
My reasoning came from the fact that for some samples I only had forward data and a subset of these were of very low quality. If I just loop over all individual .ab1 files and extract fasta I do get a result from all of them, but in some cases it just a single bp left. I presume that this base is left in to avoid having empty outputs. For contigs I would expect a consensus read to contain the high-quality sequence found in any read even if there is no matching read in the other direction.
Eg:
- Forward OK, reverse OK -> Consensus from the two reads (a true contig)
- Forward OK reverse BAD or vice versa -> Consensus is simply the high quality part of the okay read
- Forward bad, reverse bad -> Empty (or single bp) sequence
Will share data via mail asap
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/roblanf/sangeranalyseR/issues/53#issuecomment-732153679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG2SE3HAWZYYPAWFIRAJWDSRJN2PANCNFSM4OVAUQ2A .
-- Rob Lanfear Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra
www.robertlanfear.com
I have tried to create a minimal example to help identifying the issues, but it looks like the problem only occurs when I have more than 200 ab1 files as input. If split the analysis over two folders and run the analysis separately there is no longer any issues. But adding more than 200 to a single folder and running the analysis generates the following:
al <- SangerAlignment(parentDirectory = "~/seqsfish/ab/test",
suffixForwardRegExp =
"_[0-9]+_F+",
suffixReverseRegExp =
"_[0-9]+_R+")
WARN [2020-09-12 18:15:38] The number of your total reads is 0.
Number of total reads has to be equal or more than 2 ('minReadsNum' that you set)
INFO [2020-09-12 18:15:38] Aligning consensus reads ...
INFO [2020-09-12 18:15:38] Before building!!
INFO [2020-09-12 18:15:40] After building!!
SUCCESS [2020-09-12 18:15:40] >> 'SangerAlignment' S4 instance is created !!
and then
writeFasta(al, outputDir = "~/seqsfish/ab/test")
INFO [2020-09-12 18:15:48] Your input is 'SangerAlignment' S4 instance
INFO [2020-09-12 18:15:48] >>> outputDir : /home/thomkall/seqsfish/ab/test
INFO [2020-09-12 18:15:48] Start to write 'SangerAlignment' to FASTA format ...
INFO [2020-09-12 18:15:48] >> Writing 'alignment' to FASTA ...
INFO [2020-09-12 18:15:48] >> Writing 'contigs' to FASTA ...
INFO [2020-09-12 18:15:48] >> Writing all single reads to FASTA ...
Error in vapply(object@contigList, function(contig) { :
values must be length 1,
but FUN(X[[10]]) result is length 2
Hi @thokall, if you can send me or Howard your dataset (my contact: rob.lanfear@anu.edu.au) I'd be happy to take a look. We (of course) won't share your dataset with anyone. But it's really the only way we can debug things in a way that makes sure it will work for you.
I will check the option to do so. I trust you not to spread the data, but the since the data is not mine I need to get a green light from the owner, hence my attempt to create a minimal example and explore on my own.
Hello,
I get the same error from vapply
, however the contigs and alignment are written to file anyway. (I am on macOS Catalina, developper version of SangeranalyseR)
I have noticed that in the report printed on screen, some chromatograms are doubled in number... e.g. below, AB4-3 shows with 2 forward reads and 2 reverse reads but in my folder each has only 1 chromatogram...
I don't mind sending my chromatograms to your email address if it helps
*>>Contig 'AB4-3':
SUCCESS [2021-21-01 20:24:26] * >> 2 forward reads.
SUCCESS [2021-21-01 20:24:26] * >> 2 reverse reads.
SUCCESS [2021-21-01 20:24:26]
* >> Contig 'AB4-32':
SUCCESS [2021-21-01 20:24:26] * >> 1 forward reads.
SUCCESS [2021-21-01 20:24:26] * >> 1 reverse reads.
Hi @tomsauv,
Thanks for raising this issue. We'll work on it! I guess maybe the regular expression that you use matches both reads. I'll take a look after you send me the files. You can send me your files (kuanhao.chao@gmail.com) and that would be really helpful. We'll only use them for fixing this bug!
Howard
@Kuanhao-Chao, I guess this is something we hadn't considered (i.e. double counting).
Whether or not double counting is the issue in this case, we should add a test where double counting occurs. We should then add a check when parsing regular expressions that each read is assigned to one and only one group (e.g. Forward, reverse, or the contig groups). In the case that 1 or more reads could be assigned to >1 group, we should spit an Error with an error message that tablutes for each such read all the groups that it has been assigned to, as well as the suggestion that users can use a CSV file (with a link to the documentation for how to do it) if the regular expressions are not working.
Hi @tomsauv,
Thank you for your bug report. I've fixed the problem of reads repeat. Please download sangeranalyseR again from the lastest master branch.
library(devtools)
install_github("roblanf/sangeranalyseR")
Let me know if there is any new problems. Thanks!
Howard
Runs without error! Thank you
I know this is closed, but still want to mention that it now works as expected. Thanks to all that contritubed and sorry that I could not share my input to try and find the problems earlier.
Hi,
I tried the .csv method and I get the vapply error. I noted as in my earlier message of Jan21 that "2 forward reads and 2 reverse reads" shows up. The assembly completes but then I get an error to export with writeFasta
However, when I run the same dataset with the regex method, I get no issue. None of the forward/reverse reads shows as 2, only 1s. I get no issue exporting wit writeFasta.
Would the fix you did earlier only apply to the regex method and not the csv method?
This time I am on PC and latest developer version installed (trying to use sangeranalyseR for teaching...).
Thanks, Thomas
Hi,
When running:
I'm getting the following error/warnings: