Open AlexGaithuma opened 2 years ago
hi,
Can you tar up the folder: /usr/local/bin/trinity-plugins/BIN/ParaFly -c /scratch/user/akiarieg/Tickrnaseq/c4902.trinity.reads.fa.out
and send it to me? I'll take a look.
Send to 'bhaas at broadinstitute.org'
thx,
~brian
On Fri, Nov 12, 2021 at 8:12 PM Alex Kiare Gaithuma < @.***> wrote:
I have Trinity repeated Trinity several times after it failed to finish but seems to be stuck.
I am using singularity exec -e /software/containers/trinityrnaseq.v2.13.2.simg Trinity This is the stdout error I get every time.
I cant complete the assembly. This happens to many files as well. Please help!
I even tried running the command in a HPRC cluster with no success:
singularity exec -e /software/containers/trinityrnaseq.v2.13.2.simg Trinity \ --single "/data1/IxoSca/Tickrnaseq/trinity_${prefix}/readpartitions/${prefix1}/${prefix2}/${prefix3}.trinity.reads.fa" \ --output "/data1/IxoSca/Tickrnaseq/trinity${prefix}/read_partitions/${prefix1}/${prefix2}/${prefix3}.trinity.reads.fa.out" \ --CPU 1 --max_memory 2G --run_as_paired --seqType fa --trinity_complete --full_cleanup --min_contig_length 12 --bflyCPU 4 --bflyGCThreads 4 --no_salmon
The error message is:
We are sorry, commands in file: [failed_butterfly_commands.44832.txt] failed. :-( Error encountered:: <!---- CMD: /usr/local/bin/trinity-plugins/BIN/ParaFly -c /scratch/user/akiarieg/Tickrnaseq/c4902.trinity.reads.fa.out/chrysalis/butterfly_commands -shuffle -CPU 28 -failed_cmds failed_butterfly_commands.44832.txt 2>tmp.44832.1636660306.stderr
Errmsg: seq vertex T:W-1(V26087_159_D21005) not selected yet and has pred count: 2 seq vertex AGGT:W636(V26088_7596_D21231) not selected yet and has pred count: 2 seq vertex C:W-1(V26089_164_D21458) not selected yet and has pred count: 1 seq vertex TA:W640(V26090_165_D21671) not selected yet and has pred count: 1 seq vertex T:W-1(V26091_167_D21879) not selected yet and has pred count: 1 seq vertex GA:W667(V26092_7604_D22077) not selected yet and has pred count: 1 seq vertex C:W-1(V26093_170_D22261) not selected yet and has pred count: 1 seq vertex G:W-1(V26094_7607_D22431) not selected yet and has pred count: 1 seq vertex C:W-1(V26095_7608_D22596) not selected yet and has pred count: 1 seq vertex GTATGCCCG:W282(V26096_7609_D22753) not selected yet and has pred count: 1 seq vertex CT:W700(V32401_26_D13628) not selected yet and has pred count: 2
. . . . continues....
ERROR: after topo sort, still have edge unaccounted for: Edge(240995->48380,w:1.0) ERROR: after topo sort, still have edge unaccounted for: Edge(45410->45411,w:1.0) ERROR: after topo sort, still have edge unaccounted for: Edge(45457->45458,w:1.0) ERROR: after topo sort, still have edge unaccounted for: Edge(45740->45741,w:1.0) ERROR: after topo sort, still have edge unaccounted for: Edge(32407->32408,w:1.0) ERROR: after topo sort, still have edge unaccounted for: Edge(41914->41915,w:1.0) ERROR: after topo sort, still have edge unaccounted for: Edge(45418->45419,w:1.0) ERROR: after topo sort, still have edge unaccounted for: Edge(41956->41957,w:1.0) ERROR: after topo sort, still have edge unaccounted for: Edge(32408->32409,w:1.0)
Exception in thread "main" java.lang.RuntimeException: Error, graph contains at least one cycle and is not a DAG! at TopologicalSort.topoSortSeqVerticesDAG(TopologicalSort.java:112) at TransAssembly_allProbPaths.ZipMergeRounds(TransAssembly_allProbPaths.java:2612) at TransAssembly_allProbPaths.link_residual_INTER_component_unique_nodes(TransAssembly_allProbPaths.java:2868) at TransAssembly_allProbPaths.convert_path_DAG_to_SeqVertex_DAG(TransAssembly_allProbPaths.java:2515) at TransAssembly_allProbPaths.create_DAG_from_OverlapLayout(TransAssembly_allProbPaths.java:1797) at TransAssembly_allProbPaths.main(TransAssembly_allProbPaths.java:967) warning, cmd: java -Xmx10G -Xms1G -Xss1G -XX:ParallelGCThreads=28 -jar /usr/local/bin/Butterfly/Butterfly.jar -N 100000 -L 12 -F 500 -C /scratch/user/akiarieg/Tickrnaseq/c4902.trinity.reads.fa.out/chrysalis/Component_bins/Cbin2/c0.graph --path_reinforcement_distance=25 --NO_EM_REDUCE failed with ret: 256, going to retry.
--->
Trinity run failed. Must investigate error above.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/trinityrnaseq/trinityrnaseq/issues/1092, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX7ODS466ENBL4KUD3TULW3OXANCNFSM5H6AILWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas
Thanks @brianjohnhaas. I sent the file.
Hi Alex,
It looks like you've got a very complex component there that's pushing Butterfly to it's limits. I've tried running it with a lot of RAM and I'm getting stack overflows. If all the other components are finishing ok, then try running Trinity with --FORCE to have it wrap up what it could assemble. You can take that remaining component's reads.fa file and try assembling it with something else. Please keep this issue open and I'll continue to explore this to see if I can tackle it for a future software update.
best,
~brian
On Mon, Nov 15, 2021 at 7:55 PM Alex Kiare Gaithuma < @.***> wrote:
Thanks @brianjohnhaas https://github.com/brianjohnhaas. I sent the file.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/trinityrnaseq/trinityrnaseq/issues/1092#issuecomment-969556455, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX5SJQ4POI4XDRJWM4TUMGTZRANCNFSM5H6AILWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas
Hi Brian,
Just to get back to you on this. I investigated and found this section of reads are from a tandem repeat region of an intron (data is tick rnaseq data). could be helpful to explain why its problematic. All the reads match Ixodes scapularis clone 01_E21 tandem repeat region.
Thanks, Alex. I saw that too -- the inchworm step reconstructed a couple reasonably long contigs and I blast'd one of them at ncbi. I'm sure the repeat structure coupled with any polymorphisms between repeat instances are largely causing trouble for Butterfly here.
On Wed, Nov 17, 2021 at 5:59 PM Alex Kiare Gaithuma < @.***> wrote:
Hi Brian,
Just to get back to you on this. I investigated and found this section of reads are from a tandem repeat region of an intron (data is tick rnaseq data). could be helpful to explain why its problematic. All the reads match Ixodes scapularis clone 01_E21 tandem repeat region https://www.ncbi.nlm.nih.gov/nucleotide/GU318629.1?report=genbank&log$=nucltop&blast_rank=1&RID=TB1SRKYD013 .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/trinityrnaseq/trinityrnaseq/issues/1092#issuecomment-972222738, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKXZLV35UKFSZDSXG6QDUMQXUPANCNFSM5H6AILWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas
Hi Brian,
I have the same issues. My Trinity run stuck at 99.9998.
- retaining Trinity transcripts provided as input to salmon, w/o filtering (pre-salmon mode). succeeded(469410) 99.9998% completed. Additionally, I got the error that Butterfly failed.
This is the command line I run:
module load bioinfo-tools trinity/2.9.1 module load bioinfo-tools jellyfish/2.2.6
Trinity --seqType fq --max_memory 110G \ --samples_file /proj/snic2022-23-541/Ticks_project/Analysis/Trinity/STAR/files_Trinity.txt \ --output ../Trinity_deNovo
I am working with RNAseq data from another Tick, similar to @AlexGaithuma; I have 128 GB of reads. My Job died because of the timeout, unfortunately. Chrysalis and inchworm seem ok.
Would you happen to have any suggestions to solve my problem?
Thanks!
It's probably an endosymbiont or pathogen that's in the tick that's attempting to assemble. If you can figure out which read cluster that's difficult to assemble (ie. can see the command that's running via 'ps -auxww | grep Butterfly', then you can try tackling that set of reads separately to see what it is, or look at its current inchworm contigs to see what's there - just blast long contigs at ncbi).
To get the assembly job to just finish up, you can kill the current job and then rerun it with the --FORCE option. It won't do any more assembling but rather just wrap up what it could assemble.
hope this helps,
~b
On Mon, Mar 20, 2023 at 9:59 AM lachemontes @.***> wrote:
Hi Brian,
I have the same issues. My Trinity run stuck at 99.9998.
- retaining Trinity transcripts provided as input to salmon, w/o filtering (pre-salmon mode). succeeded(469410) 99.9998% completed. Additionally, I got the error that Butterfly failed.
This is the command line I run:
module load bioinfo-tools trinity/2.9.1 module load bioinfo-tools jellyfish/2.2.6
Trinity --seqType fq --max_memory 110G --samples_file /proj/snic2022-23-541/Ticks_project/Analysis/Trinity/STAR/files_Trinity.txt --output ../Trinity_deNovo
I am working with RNAseq data from another Tick, similar to @AlexGaithuma https://github.com/AlexGaithuma; I have 128 GB of reads. My Job died because of the timeout, unfortunately. Chrysalis and inchworm seem ok.
Would you happen to have any suggestions to solve my problem?
Thanks!
— Reply to this email directly, view it on GitHub https://github.com/trinityrnaseq/trinityrnaseq/issues/1092#issuecomment-1476285811, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX7HRSELKCD3ZXRT7BLW5BPEHANCNFSM5H6AILWQ . You are receiving this because you were mentioned.Message ID: @.***>
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas
Thanks!
@lachemontes I found out it is best to map to similar genomes first. Assemble the unmapped reads first, blast to confirm non tick contigs) and use that assembly to filter out non-tick genes.
"Hey! Thanks, @AlexGaithuma. Which genome species did you choose to map your reads? I am working with Ixodes ricinus. If you are okay with it, what program are you using to map them? I use STAR." Would you happen to have a link to share with me to do what you suggest? I have been trying to assemble my transcriptome for a month and a half without success. Thank you very much
Hi @lachemontes I use bbmap, and map to several closely related genomes. Build the reference indexes
bbmap.sh \
threads=16 \
build=1 \
ref=Ref1.fasta
bbmap.sh \
threads=16 \
build=2 \
ref=Ref2.fasta
In your case, I would map to;
Using bbmap, you retrieve the unmapped reads to the first genome and map to the second genome etc..
for FNAME in $DIR/data/*_1.fastq.gz
do
SAMPLE=$(basename $FNAME _1.fastq.gz)
r1=" $DIR/data/${SAMPLE}.1.fastq.gz"
r2=" $DIR/data/${SAMPLE}.2.fastq.gz"
bbmap.sh \
in=$r1 \
in2=$r2 \
build=1 \
threads=8 \
maxindel=200k \
xs=us \
sam=1.3 \
-Xmx10g \
outm=$WORKDIR/Ref1/${SAMPLE}.mapped.fq \
outu=$WORKDIR/Ref1/${SAMPLE}.unmapped.fq \
statsfile=$WORKDIR/Ref1/${SAMPLE}.mapstats.txt
done
split unmapped reads to paired reads using bbmap's reformat.sh script
for FNAME in $DIR/data/*_1.fastq.gz
do
SAMPLE=$(basename $FNAME _1.fastq.gz)
reformat.sh \
in=$WORKDIR/Ref1/${SAMPLE}.unmapped.fq \
out1=$WORKDIR/Ref1/${SAMPLE}.unmapped.1.fq \
out2=$WORKDIR/Ref1/${SAMPLE}.unmapped.2.fq
done
map the unmapped to the second genome..... and the same to the third genome.
Check the final unmapped reads. They should be far fewer...you can assemble them separately and blast to see if any tick sequences remained. If there are reads, just map the reads to the assembly and retrieve them.
cat all reads mapping to tick sequences and Finally assemble them.....
@AlexGaithuma ,Thank you so much for your suggestion, I will try it, and I hope this approach works for me! By the way, Ixodes ricinus (assembly ASM97304v2) is highly fragmented and only has 20 of completeness and single copy BUSCO genes, I don't recommend you to use it for further analysis.
I have repeated Trinity several times after it failed to finish but seems to be stuck.
I am using singularity exec -e /software/containers/trinityrnaseq.v2.13.2.simg Trinity This is the stdout error I get every time.
I cant complete the assembly. This happens to many files as well. Please help!
I even tried running the command with this CBin.fasta file in a HPRC cluster with no success:
The error message is:
. . . . continues....