Open melop opened 8 years ago
Hi Rongfeng,
The feature is kind of there. To do this, you need to run "rascaf" with "-cs" option to output the sequence supporting the connection of two contigs.
As a result, you will have a file look like "*_cs.fa". Then you can blast those sequences against refseq_rna database. If you don't have the blast database locally, you can use the "-remote" option. For example: "blastn -db refseq_rna -task dc-megablast -query rascaf_cs.fa -out blast.out -outfmt 6 -remote -max_target_seqs 20" This procedure is quite slow and assumes you have blastn on your machine.
After that you can run the "EvaluateBlastResults.pl" script as: "perl ./EvaluateBlastResults.pl blast.out rascaf_cs.fa". The output will show the validation for the sequence from rascaf_cs.fa file. 1: valided. 0: uncertain. -1: misorientation -2: misorder -3: chimeric -4: no match in the blast database.
You can then adjust the corresponding connection in the rascaf.out file based on the validation result.
Thanks, Li
Dear Li,
Thank you for the quick reply!
For the last step, is there a way to automate the process? I could write a script for it in case you don't already have one.
Another question is why not use tblastn to make the alignment at the amino acid level?
Thanks Ray On Nov 28, 2016 8:58 PM, "Li Song" notifications@github.com wrote:
Hi Rongfeng,
The feature is kind of there. To do this, you need to run "rascaf" with "-cs" option to output the sequence supporting the connection of two contigs.
As a result, you will have a file look like "*_cs.fa". Then you can blast those sequences against refseq_rna database. If you don't have the blast database locally, you can use the "-remote" option. For example: "blastn -db nt -task dc-megablast -query rascaf_cs.fa -out blast.out -outfmt 6 -remote -max_target_seqs 20" This procedure is quite slow and assumes you have blastn on your machine.
After that you can run the "EvaluateBlastResults.pl" script as: "perl ./EvaluateBlastResults.pl blast.out rascaf_cs.fa". The output will show the validation for the sequence from rascaf_cs.fa file. 1: valided. 0: uncertain. -1: misorientation -2: misorder -3: chimeric -4: no match in the blast database.
You can then adjust the corresponding connection in the rascaf.out file based on the validation result.
Thanks, Li
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mourisl/Rascaf/issues/2#issuecomment-263377020, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlYwTB6WLOeovbvvSrLm55e6blEQ6Ayks5rCzJ3gaJpZM4K9yGt .
Sorry I meant tblastx. On Nov 28, 2016 9:33 PM, "Ray Cui" platycerus@gmail.com wrote:
Dear Li,
Thank you for the quick reply!
For the last step, is there a way to automate the process? I could write a script for it in case you don't already have one.
Another question is why not use tblastn to make the alignment at the amino acid level?
Thanks Ray On Nov 28, 2016 8:58 PM, "Li Song" notifications@github.com wrote:
Hi Rongfeng,
The feature is kind of there. To do this, you need to run "rascaf" with "-cs" option to output the sequence supporting the connection of two contigs.
As a result, you will have a file look like "*_cs.fa". Then you can blast those sequences against refseq_rna database. If you don't have the blast database locally, you can use the "-remote" option. For example: "blastn -db nt -task dc-megablast -query rascaf_cs.fa -out blast.out -outfmt 6 -remote -max_target_seqs 20" This procedure is quite slow and assumes you have blastn on your machine.
After that you can run the "EvaluateBlastResults.pl" script as: "perl ./EvaluateBlastResults.pl blast.out rascaf_cs.fa". The output will show the validation for the sequence from rascaf_cs.fa file. 1: valided. 0: uncertain. -1: misorientation -2: misorder -3: chimeric -4: no match in the blast database.
You can then adjust the corresponding connection in the rascaf.out file based on the validation result.
Thanks, Li
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mourisl/Rascaf/issues/2#issuecomment-263377020, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlYwTB6WLOeovbvvSrLm55e6blEQ6Ayks5rCzJ3gaJpZM4K9yGt .
For the last step, I'm sorry we don't have the script to remove the connections from rascaf.out now. If you can wait for 1 day, I can implement it now (this is on my todo list anyway).
I think tblastx would work as long as the output is the same as blastn (-outfmt 6).
Thanks, Li
Hi Li,
no problem, please take your time. Let me know when this script is
ready.
If I want to use a protein database, like taxonomic specific
uniprot, is it also possible (with blastx -outfmt 6)? Best Regards, Ray
On Mon, Nov 28, 2016 at 9:45 PM, Li Song notifications@github.com wrote:
For the last step, I'm sorry we don't have the script to remove the connections from rascaf.out now. If you can wait for 1 day, I can implement it now (this is on my todo list anyway).
I think tblastx would work as long as the output is the same as blastn (-outfmt 6).
Thanks, Li
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mourisl/Rascaf/issues/2#issuecomment-263388955, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlYwQmm2w5e4FakGKt0RJ55a1B9g0JYks5rCz2CgaJpZM4K9yGt .
I just tested blastx with one sequence and I believe the script will work fine with blastx.
Ok, thank you for testing!
Ray On Nov 28, 2016 10:26 PM, "Li Song" notifications@github.com wrote:
I just tested blastx with one sequence and I believe the script will work fine with blastx.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mourisl/Rascaf/issues/2#issuecomment-263399295, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlYwUSjKUN_UczTq1RcLu22woQf-WTDks5rC0cBgaJpZM4K9yGt .
I just uploaded the script "FilterRascafConnection.pl". You can use it after obtaining the validation results from blast. The commands is like: "perl FilterRascafConnection.pl blast_eval.out rascaf_cs.fa rascaf.out > rascaf_filter.out" .
Please let me know whether this works.
Thanks, Li
Dear Li,
thank you for your fast response. I will test this script.
In your paper you have compared Rascaf to two other RNA scaffolders.
I am aware of an unpublished scaffolder called BESST_RNA. Out of curiosity I'm comparing results between them. So far it seems that BESST_RNA yielded a somewhat higher N50 (input N50: 815,196, BESST_RNA N50: 910,428, Rascaf N50: 848,421), and a slightly better BUSCO performance: (input complete BUSCOs: 4380, BESST_RNA: 4410, Rascaf: 4397). BESST_RNA requires merging of all bam files into a single input. Other than that I set a minimum link number to 4 for both programs. I'm wondering if you would have time to perform some systematic comparisons between the two regarding accuracy and performance.
Best Regards, Ray
On Tue, Nov 29, 2016 at 4:25 AM, Li Song notifications@github.com wrote:
I just uploaded the script "FilterRascafConnection.pl". You can use it after obtaining the validation results from blast. The commands is like: "perl FilterRascafConnection.pl blast_eval.out rascaf_cs.fa rascaf.out > rascaf_filter.out" .
Please let me know whether this works.
Thanks, Li
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mourisl/Rascaf/issues/2#issuecomment-263466183, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlYwRQbMltGttxeXU1IsBA96NW-iom9ks5rC5s7gaJpZM4K9yGt .
Thanks for letting me know the comparison. I tried to run BESST_RNA, but it crashed. It is likely a problem about python on our sever. I will try it on a different computer later.
Dear Li,
thanks for looking at this.
Best Regards, Ray
On Tue, Nov 29, 2016 at 11:15 PM, Li Song notifications@github.com wrote:
Thanks for letting me know the comparison. I tried to run BESST_RNA, but it crashed. It is likely a problem about python on our sever. I will try it on a different computer later.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mourisl/Rascaf/issues/2#issuecomment-263717325, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlYwfEQgBgqBJKRiSYHENJhFFeXpGAWks5rDKP3gaJpZM4K9yGt .
Hello,
Best Regards, Rongfeng Cui Max-Planck Institute for the Biology of Ageing.