Closed kubu4 closed 6 months ago
Assuming a single file still fails can you please post a link to one single file that fails and one single file from another species that works? Must be in fastq formatting?
These worked with P.evermani and ShortStack previously:
The following do NOT work with P.evermani and ShortStack:
The following work with P.meandrina and ShortStack:
Have you considered going back with evermani from raw -> merged..
I don't I follow.
Skip trimming?
No - just doing everything again - pretending you just got the raw data.
Inspected fastq files and did not notice any difference- though multiQC still worth a shot.
The main differences between the two FASTQ files you've provided are:
Index Sequence: Each read entry has a tag that indicates which sample or experiment it belongs to. In the first file, the reads are tagged with the index GGTAGCAT
, while in the second file, the index used is GTGGCCAT
.
Read Length Variability: Both files show variability in read lengths, but they handle this variability differently. For example, while both files contain reads of varying lengths, the specific lengths and how frequently they occur differ between the files.
Quality Score Variations: The FASTQ format includes a quality score for each base, providing information about the probability of an incorrect base call. The two files show some variations in these quality scores (denoted by characters such as F
, :
, #
, etc.). This could suggest differences in sequencing quality or different sequencing technology settings or calibration.
These differences can affect downstream analysis, such as alignment and quantification, and should be considered during data preprocessing and analysis.
No - just doing everything again - pretending you just got the raw data.
Gotcha.
multiQC still worth a shot.
What do you mean by this?
Pretty sure there's FastQC/MultiQC reports for all FastQs, at all stages.
Did re wrangling from raw data impact anything?
Nope.
I've re-trimmed, as well as modified trimming parameters (mostly final read length), and it hasn't made a difference.
I've added a bunch of print statements to the source code to try to see precisely where things are going awry. This has sort of been useful, but not really. I know which variable(s) aren't populating, leading to the error, but it's a slog to work my way backwards through the code to try to figure out which input file(s) are being processed, so that I might be able to look through those.
Admittedly, I'm getting a bit burnt out on trying to troubleshoot this. It's very tedious and time consuming. Each time I have to re-run the code takes about 20mins to get to the error.
I am going to try a few things - if that does not work we will move to just R1 - will send you files to try today.
R1 only doesn't work with P.evermanni, either...
First in series of files to try - Groomed 47 merged https://usegalaxy.org/api/datasets/f9cad7b01a47213562fe7b27a2369930/display?to_ext=fastqsanger
have we considered evermanni genome is the problem? - run evermanni reads on different genome.
have we considered evermanni genome is the problem?
Yes, definitely considered. However, we CAN run ShortStack with P.evermanni with the "original" trimming params. I even re-ran ti this week to confirm that it (still) works.
run evermanni reads on different genome.
Sure, I'll do this!
Will shortstack work on fasta?
Just glanced at documentation, and yes, FastA formatted reads are accepted as inputs.
Groomed 47 merged
Was this done with trimmed reads?
Groomed 47 merged
Was this done with trimmed reads?
Eh! Never mind. Just looked at FastQ and see that they're trimmed to 25bp?
Here is a fasta of 73 based on your merged fastq - https://usegalaxy.org/api/datasets/f9cad7b01a472135fe471ba2ddeb7983/display?to_ext=fasta.gz
give that a try
Here is a Groomed 73 merged to try - https://usegalaxy.org/api/datasets/f9cad7b01a472135de52bdc05fda83a2/display?to_ext=fastqsanger
Here is new interlaced (merged) 73 https://usegalaxy.org/api/datasets/f9cad7b01a472135fc6850d35af09e48/display?to_ext=fastqsanger.gz
First in series of files to try - Groomed 47 merged https://usegalaxy.org/api/datasets/f9cad7b01a47213562fe7b27a2369930/display?to_ext=fastqsanger
Completed successfully.
Original trim length was 25bp,which also had run successfully. So, maybe something there?
I tried a 30bp trim yesterday, which failed...
Here is new joined (merged) 73 https://usegalaxy.org/api/datasets/f9cad7b01a472135efc8e5f9433aa51b/display?to_ext=fastqsanger.gz
Here is a fasta of 73 based on your merged fastq - https://usegalaxy.org/api/datasets/f9cad7b01a472135fe471ba2ddeb7983/display?to_ext=fasta.gz
give that a try
This one failed with same error.
@sr320 successfully ran one of the original faspt 31bp merged reads (I belive the 73
sample) on his laptop via command line. Additionally, the developer responded to my issue (GitHub issue) and was successful in running all three samples on his computer, via command line.
So, I'll give the command line a rip on Raven and see how it goes. If that fails, I'll run in on my laptop and see how it goes. And, if that all fails, we know @sr320 can run on his computer, if needed.
Unbelievably, this ran successfully on raven
, via the command line!
So weird...
Gah! Spoke too soon!!!
The command @sr320 (as well as the developer) omitted an option I had been running --dn_mirna
for de novo sRNA prediction. Once I add that back in, the command fails... :cry:
EDITED: Fixed option.
And if we leave out?
Then, the results aren't comparable to the other two species we've run?
Go ahead and try other files above I made and also with different genome if not already done.
Steven B. Roberts, Professor School of Aquatic and Fishery Sciences University of Washington Fisheries Teaching and Research (FTR) Building - Office 232 1140 NE Boat Street - Seattle, WA 98105 robertslab.info https://faculty.washington.edu/sr320/ - @.*** - @sr320 vm:206.866.5141 - cell:360.362.3626 schedule a zoom call: https://d.pr/PfBNav
On Wed, May 8, 2024 at 4:00 PM kubu4 @.***> wrote:
Then, the results aren't comparable to the other two species we've run?
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/urol-e5/deep-dive/issues/39*issuecomment-2101634525__;Iw!!K-Hz7m0Vt54!hdC3F1NkcHjJ8AIEadSFZ9hp4XieU6XujDqRbYNIjpR6i6EJWwmDGhrZULmw6FpHXA4kUTzXLkHRN-hqtAX3Edo$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PN5C5MNCE2TA2OWMMSDZBKVBPAVCNFSM6AAAAABFEA4SHSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBRGYZTINJSGU__;!!K-Hz7m0Vt54!hdC3F1NkcHjJ8AIEadSFZ9hp4XieU6XujDqRbYNIjpR6i6EJWwmDGhrZULmw6FpHXA4kUTzXLkHRN-hqQFcrSJU$ . You are receiving this because you were mentioned.Message ID: @.***>
Amazingly, the developer found the bug and has fixed it! (GitHub Issue)
He's indicated he'll put in in the next release, which will be "soon." Not sure how long that actually means.
I'll glance at the fix and see if I can incorporate the changes myself.
Alrighty, I implemented the changes mentioned in the developer's comment and have successfully run ShortStack on P.evermanni!
I've encountered the above error when using ShortStack. I've already described the issue on the developer's repo, but haven't gotten a response yet. If anyone is willing to glance at the details shown in that issue, I'd greatly appreciate it.
The truncated version of the error is:
The odd thing is this error is only occurring using the P.evermanni genome and only when using R1-only reads or merged reads.
ShortStack runs fine using unmerged R1 and R2 reads.
I'd also like to add that ShortStack works without any issues, regardless of input reads (R1/R2, R1 only, merged), on the two other species we've looked at.
If anyone has any suggestions as to how to approach this, I'd greatly appreciate it.
Maybe I'll try each individual FastQ file, one at a time, and see if there' some problematic read(s) in one of them?