Open mprincipato opened 10 months ago
Hi @mprincipato
CESAR distribution provides a few test samples Could you pls try running CESAR on a couple of them? Like https://github.com/hillerlab/CESAR2.0/blob/master/extra/example4.fa
Then pls try cat XXX.fa | ./cesar /dev/stdin
I suspect the current implementation is not quite compatible with WSL. (but I need to check the docs, probably there is a simple way to adjust subprocess calls to make it work)
Thank you, @kirilenkobm !
I spotted one thing in the documentation (the fact that Python 3.11 was recommended, but my default version was Python 3.10.12) and went down an ill-advised rabbit hole trying to set my default Python3 to 3.11 and wound up breaking things further. I think I was able to undo what I did, though, and these results are from running CESAR after that!
example1.fa
m@MPC:~/code/TOGA/CESAR2.0$ ./cesar extra/example1.fa
>referenceExon
gCCTGGGAACTTCACCTACCACATCCCTGTCAGTAGTGGCACCCCACTGCACCTCAGCCTGACTCTGCAGATGaa
>mouse
ctttcctcatttcctcaggcttcagtatagcatgaggctgaggaggagagagggagaccggcaaagtggccttgcttaggtaccatctttgcccctttagGCTTGGCAACTTCACCTACCACATCCCTGTCAGCAGCAGCACACCACTGCACCTCAGCCTGACCCTGCAGATGAAgtgagtgctggtgtgtgggtatgtgtgggggaccatgtggaagccctcagaaaagtgaaagccaagtgcttactaaatttattacgtggagggtccaggc
example4.fa
m@MPC:~/code/TOGA/CESAR2.0$ ./cesar extra/example4.fa
>referenceExon
---CCCCTGTCCCGCTGGTTGAGATCTGTGGGGGTCTTCCTGCTGCCAGCCCCCTACTGGGCACCCCGGGAGAGGTGGCTGGGTTCCCTACGGCGGCCCTCCCTGGTGCACGGGTACCCAGTCCTG---GCCTGGCACAGTGCCCGCTGCTGGTGCCAAGCGTGGACAGAGGAACCTcg aGCCCTTTGCTCCTCCCTCAGAATGAACGGAGACCAGAATTCAGATGTTTATGCCCAAGAAAAGCAGGATTTCGTTCAGCACTTCTCCCAGATCGTTAGGGTGCTGACTGAGGATGAGATGGGGCACCCAGAGATAGGAGATGCTATTGCCCGGCTCAAGGAG GTCCTGGAGTACAATGCCATTGGAGGCAAGTATAACCGGGGTTTGACGGTGGTAGTAGCATTCCGGGAGCTGGTGGAGCCAAGGAAACAGGATGCTGATAGTCTCCAGCGGGCCTGGACTGTGGGCTGGTGTGTGGAACTG CTGCAAGCTTTCTTCCTGGTGGCAGATGACATCATGGATTCATCCCTTACCCGCCGGGGACAGATCTGCTGGTATCAGAAG>>>>>>>>>>>>>>>>>>>CCGGGCGTGGGTTTGGATGCCATCAATGATGCTAACCTCCTGGAAGCATGTATCTACCGCCTGCTGAAGCTCTATTGCCGGGAGCAGCCCTATTACCTGAACCTGATCGAGCTCTTCCTGCAG>>>>>>>>>>>>>>>>>>>AGTTCCTATCAGACTGAGATTGGGCAGACCCTGGACCTCCTCACAGCCCCCCAGGGCAATGTGGATCTTGTCAGATTCACTGAAAAGag gTACAAATCTATTGTCAAGTACAAGACAGCTTTCTACTCCTTCTACCTTCCTATAGCTGCAGCCATGTACATG GCAGGAATTGATGGCGAGAAGGAGCACGCCAATGCCAAGAAGATCCTGCTGGAGATGGGGGAGTTCTTTCAGATTCAG GATGATTACCTTGACCTCTTTGGGGACCCCAGTGTGACCGGCAAAATTGGCACTGACATCCAGGACAACAAATGCAGCTGGCTGGTGGTTCAGTGTCTGCAACGGGCCACTCCAGAACAGTACCAGATCCTGAAG GAAAATTACGGGCAGAAGGAGGCTGAGAAAGTGGCCCGGGTGAAGGCGCTATATGAGGAGCTGGATCTGCCAGCAGTGTTCTTGCAATATGAGGAAGACAGTTACAGCCACATTATGGCTCTCATTGAACAGTACGCAGCACCCCTGCCCCCAGCCGTCTTTCTGGGGCTTGCGCGCAAAATCTACAAGCGGAGAAAG
>rn6
aatagtgtcactttaggaccttccatcccataattctccgtaccgaattgtagtgaacgcagctttggaagtagaaagcgttcagttttagccatttgcggtttaagtgaagatctgttggctctcagtcactccgatatactttttatttcctgtgtgttatttatgtaattcttgaatttagaaacagggtggggcattctaatgcattctaaaagcttgggttgcggggtaccttaggtaacagccaccaagcatctgccttcggtgcttgctcctgcagggagtgcttggtgtcccccacccccatgcccacccaggATGCCCCTGTCCCGCTGGCTGAGATCTCTGGGGGTCTTCCTGCTGCCAGCCCCCTGCTGGGCACCCCGGGAGAGGTGGCTTGGTTACCTACAACGACCCTCCCTGGCATATGGGTGTCCAGTCCTTGGGGCTTGGCACAGTGCCCGCTGGTGGTGCCAAGTGTGGACAGAGGAGCCTCGgtgagtgtggttggggtgtggggcttcggggagggaagcaccggagccaccctctcactcatttgtgtgttttttccctcagAGCATTTAGCTCCTCTGTCAGAATGAATGGGGACCAGAAACTGGATGTTCATAACCAAGAAAAGCAGAATTTCATCCAGCACTTCTCCCAGATTGTCAAGGTGCTGACTGAGGATGAACTGGGACACCCAGAGAAGGGAGATGCTATTACCCGGATCAAAGAGgtgagggattcaggactgaagaagtcagtagaggtgtggattcgctgccagggttttgataagagcagaagtaacgtttttacctgtgggtcccgactgtacttcaacccacccagatatttggctccctgactttggaccatcccatcagctttgcttgagagcagagccctggacgtcatggtttgattttatgccctgagactataagataggatcttagcagataccaaaaggctgttggcactgaggcctgagggaagttactgtattgttaaagtctgctctaaaaagggagttgttcctcttctgcttgagttacctatttccctccagtctcttgagtttattttctttttttttggggggggggggtctttttttcggagctggggaccgaacccagggccttccgcttcctaggtaagcgctctaccactgagctaaatccccagccccttgagtttattttcttttcttttttttttttttttttttggttctctttttcggagctggggaccgaacccagggccttgtgcttcctaggtaagcgctctaccactgagctaaatccccagccccttgagtttattttcttaagacagagtttcaatttcttttttttttttttttttttttttttttggttcttttttcggagctggggaccgaacccagggcctttcgcttcctaggcaagcgctctaccactgaactaaatccccagccccgagtttcaatttcttttttttttttttttttttttttttggttcgttttttcggagctggggaccgaacccagggccttgcgcttcctaggtaagtgctctaccactgagctaaatccccagccccgagtttcaatttctaaacccacatgtcctcacacacacaaccccctgcctcagcctactgtgtgctgtatcaattacagtacatgtctagattgggaattctttttttttttttttttcaaaaaaagatttattatatataagaacactgtagctgtcttcagatacaccagaagaaggcatcggatcccattacagatggttgtgagccaccatgtggttgctgggaattgaactcagaacctctggaagagcagtcagtgctcttaaccactgaaccatctctccagcccaggaaacttttttttttttttttttttttttttttttttttttttttttttttttttgggttctttttttcggagctggggaccgaacccagggccttgcgcttcctaggtaagcgctctaccactgagctaaatccccagcccccaggaaacttttttttaaagacatacattttttttaaaaaattgtgcattgactttttgcctgcctctgtgtgtgtgtgtgtgtgtgtacacacttgtgagctgccatgtaggtgctgggaattgaacccaggtcctctggaggggcagtcagtgctcttcactgccaagccatctctagcctggggaattcttttttttataaatgtctcatctttgaggaaaacaccaaggagaattttctttgtctatactgtgcaggtattaaagagctatctctttagttagttagggctgctgtctggcagaaggcacagttgagaaggctccaggaggaggacattagaagggttagacgtagccctgactcttgctgtcccttcaatgccttcttttggctaactttggagaagccttgtcttttgttttgtttctttgttttctgtagccaaggctgtccagaactcattctgtatctgaggctggcctcatactcatcagtctgcctgcctccctgctgagtattgagcttataggagttagccatcactctacgctagggcccttgtcttcactacaagcttgtttgcaggtggtccaagttctcggcttgagggaagaactgggggtatgttatgtagcaggtacttgggttccatatgctataggactttggagaagaaatgaagaggggactgacccagggactccccaaggggagttggggttcaaagagggagaaggaaaggggaaacctcttcaggtcccttctgtgtggaccctgatgtcctctcaccctgcctcagGTCCTGGAGTACAACACTGTAGGAGGCAAGTACAATCGGGGTCTGACGGTGGTACAGACCTTCCAGGAACTGGTGGAACCAAGGAAACAGGATGCTGAGAGCCTACAGCGGGCCCTGACGGTGGGCTGGTGTGTAGAACTGgtaagagcggtcggagaagcagagttcccacagttggggcttccttggtgaggagcagaagcttgtttcttgcttgtgttgatgtgtgagtctcttgtgtgctacagtcattcgggcagtgtggatcccttcaggggctgacagatgggcacagagagttttcagaaatgctcaccttaaatatcaccatgtggtggtgtcccatgcctttgatcttatctcagggagcagaggcaagtggatctctatgagttctaggctagcctgatctatagagtaagttctaggatggccagggctactcaagaaaccctagtttcaaataacagaaaccttaaattcttaaacttaacaaagaagacatgccattaaaaaagaaagtagacttttaaagtctggttagatggcttggccggtgagagcatttgccatcagaagagccgaatgtgtgagccagggccgtagaaacccagttgtgtctctggacacagaatctgcatggtaggtagagccagcaggctggccaggctgcctggctcagtgagagagaccctgtttcaaaggaacattgatgtgatagacaggacacctggcatcttcctttggcttctccgcccctgttttcatgcccacactgaatgcacccaagtgtgtgcttggtggtacacatctgtaatacatacactcacctagtgctggggcatggagataggcgggtccccagagttcgctggcaagctaggttagccaatcagtcagttccgggctcagaaaggtttgtctcccaaaactgaggacacctgatactcacctcttgctgctacagatgcgcccagttcagatgcccatccatatatacaacttgtgacacagacacaagcacacaaggcctgctgggcaggcaggcaggcagacacacttgtgttcagtttctcactatctatatgacaactgtgtgcatctgttgcacccttcccggagttgggatagatgtgcagatcctggagcttgttggccagtctgaccatttcagtgaccaccctgtctcaaagaataaggtggagaggctggagagatgggccagtgtttaagagctctggctgcttttccagaggaccggggttcagtggttagggatctgacaccctcttctggccttcatgggcaccaggcatgcacatggttagcagacagacttgcagaccagcaccagtgcatttaaaataagtagtcctctgtttggcagcacatatactgaaattggaacaatacagagaagattagcttggcccctttgtaaggatgacacaaattcatgaggcagtcactctctttttttcctttctgtcttttcatttctatcaaatgtgtgaaaaaatacataaagattggttataaccaaaaatatgttatataaaattttaaaaatgaggaagacatctcacagaattcccccaaaatgaggacatccaaggtcgacctctgacctccacatgattgcatttgtaactcatgtacacagagcacatgcatacatgaaacatgtgtgcacaatataatatgtgcacacaattctacaaatctataataacatgtatgcatacactacacacatgcacacatgtacacacataaacagttcagatcactggcatgttcacttaatgaaatgattttccagagtgaattaaaagctgcttccctaatgtagggtacgtgctccacctatggctctgattttcagtcctggctcttgaatgtctgactttgttgtttcctgggtgggtgagaggtttccaatttggtattgaagtccagggccttgtctattgtaagcacgctcttatcccttaactatgtccctgccctttggtttcggtttttgatttttcactggctgtcctggaacttactctgtagatcaggctggcctcaaactcagctccacctgccctatctcccaagtgctgggattagaggtgagcgccaccagtgcctggctgtttttgtttatgagacaaggtctcactgtgtacagctcacacatactcaagatcatctcgcttcagccttccaagtgctgggatcatagatgtgtgccatcacgcctgactcatactgacttactaaattattggtctattaaatagaccagtttgcggtgtttggatgatgtaacgatagacttctgttgagaatcataggggtggaaggatgagtcaaagttctggagccctgagtttgatccttagaacccatgtaagaagtgaactgactcccagaagtcaccctctgacatcactgcatgctgtgacgtgtgagcatccacaccgagacacaccttcctcacccctgactagtgaggggctggcagtgtggttctagtgggtaaaggcactctgtcgaaagcccagcaaagtgagttctgtcccaggattcatgggaagatggaaagagaaaaacaactcctagaaattgtcctctgacctttatttgtgtgctgtggcatgtatgtgcccttgcacacaataataatagaagtttgttaaaacagaaaaaaaatagataattttgttgtgggggtcaaaagccttccatcccatatgtaaggccctgggttccatcctcagcactttaacacagtaaagccaaggcatgcagttaaagcaattgccacgtaacaaacgggtagccttgccggaaggtgcgtgggccagtgttcaggagatgggctgaggaccttgagatcgaagccagcctggccttcataccgaggttctgtctcaaggaatcaaaacccaaaagtgtcctggatgttgttgagtgttcagtcttccatctccctaatagagagcgcgtgggaggcaccactctgtggtcagggctaggctctgggaatacctgaggagccagcattgctgaactggaagagcatttaccagcaggcagagtggctcagtggtagcatgaggtcccgggttcagcactcgcagtgcacagaaccaaacatgacagtagacagctgtgacctcggcacctgggaaggagcggtaaaggcagaagtacagatccttggccacggagggactttggctcatctaagggaagacgccattgattgccaaggcggtgctcttggggtggtgctcctggaagtgtgactctcatagcccgtcactgagggaagtcagaatgggagttgatccagatgccgtgcagagtgatggtgactggtttgttcccacagctttctcagcctgttcccttacacaccccagaactgtctgcccagggatgcagcacccactgtgaaaatgcctcacagccttgctagaagccagtctggtgagggcatttttttcagttgaggctcccttttgccaaatgactatagcttgggtcaagttgacctcaaactagccaacacagtcacttactggaacaaaagttacggatgttttgggtaagagtgaggtacgtgaaggtgtttcatactcaacttactaccgagggagtcctagagccgattagaggcggcagagctgtgttttagtctgttagcctgaaaatgagggtgcagacatgggatcctgtaatctgtcagagttcccataaccaccccttccacccagtgcacggcagtggccagggcagcagccgtcactgagagagggcccctttcagagccctggggtcttactgtgttccttctccagCTCCAGGCTTTCTTCCTCGTGTTAGATGACATCATGGACTCTTCCCACACTCGCCGGGGGCAGATCTGCTGGTATCAGAAG-------------------CCGGGCATAGGCTTGGATGCCATCAACGATGCTCTGCTTCTGGAAGCCGCTATCTACCGCCTGCTTAAGTTCTACTGCAGGGAGCAGCCCTACTACCTCAACCTGCTGGAGCTCTTTCTACAG-------------------AGTTCCTATCAGACTGAGATCGGGCAGACTCTCGACCTCATCACAGCACCCCAGGGCCAAGTGGATCTTGGTAGATACACTGAAAAGAGgtgaggcccctggcaaccatgtgtagactttgaggcactcaacatgggcctagcccttaggagtgcatcttctcccctgactcagGTACAAATCTATCGTCAAGTACAAGACAGCTTTCTACTCTTTCTACCTGCCTATCGCGGCTGCCATGTACATGgtgagtcagtggcacctccactcttttcccttgggggcattgggatgggaggacatggagtagacattcagggtatgatggtataacttggagcaagccacccagtccctctgacactggaaagtgagcagagctgagtcctatctgagtgggaggggcagaggaaaatttaggctgagcaagagactaggtgtggcggtggctcttgggtccatcctggccagcctcttcaggttttctcttcattttgtaagaggttggagttggaagtggagctgtaatggctttccaggggaagatggcctgatgccttcagaatggattcattccttaggcctcccggggaggggctgaggttggactgcagttacgcagaacccatggaaagtgtcaagaagggcagggtgctgtggccatgctgccttacctgtctccctgtagGCTGGAATTGATGGGGAGAAGGAACACGCTAATGCCCTGAAGATCCTGCTGGAGATGGGCGAGTTCTTCCAGATCCAGgtaggagggtctgcagcgggaagtaataagaggattttccaccccacgctatgccgggcaccatggggtgttctcatctcggcagactgtgacccattgttcttaacactctccacccgactcgctaagttcccgtggtgggtcgctgccactccagccccatttaacaactgcccacctcacggttggctgtcacagatatctataccctggcaaaatcaaaactctagaggcttgtaatttattaatcagatttatatcagtaaattctcaccccacaaaacgcccacataataaactcagagccaattgatactgatataaagggcccacctagataagatgtcctgtagtcatctatcccttatataatactcatagttacccgtggctgtttaaagctacacggatccgggtggtcctcttcctccgtcttccttctccccttcctgtgctctttgtctctagaattctcggcccacttttctttttcactgtccaatcacaggctcttgccttgtcttctacctgcccttccctgcttacagacagcagtgtacacaggtcctccagccacactgcatggagcttctcatctctcctttctgcctgctccctagGACGACTACCTTGATCTCTTTGGAGACCCCAGTGTGACCGGAAAGGTCGGCACTGACATCCAGGACAACAAATGCAGCTGGCTGGTGGTTCAGTGTCTGCTACGAGCCACTCCTCAGCAGCGCCAGATCTTAGAGgtgcccaagtggggcttgggggtggcatgtccctttgctaagaggggatgggggaggagtggttagaacagtgattttgctgtccaggagacatttgggagctcagtgcaatttggtgtatgtgctgaaggtagcaggcaggaccaggaagattttgtacagtgccagacaacacatgcccagtaactgtcagtggtgtcgaggctgagaatccatagaggaagtagaatcctgagccaatgcctgaactttggggttggggaggcaatgcttgagattggattcccagagagagagctttgccagtgttctggaatgtaggagttgtgagggcctttgaatccagccaaaaaaagctgtttctctgcttccagGAGAATTATGGGCAGAAGGACCCAGAAAAAGTGGCGCGGGTGAAAGCACTGTACGAGGAGCTGGATCTGCGGAGTGTGTTCTTCAAGTACGAGGAAGACAGTTACAACCGCCTCAAGAGTCTCATAGAGCAGTGCTCCGCGCCCCTGCCCCCATCCATCTTCCTGGAACTAGCAAACAAGATCTACAAGCGGAGAAAGtaacctcgaattgtagaggctgcgagggaggggtctcaataaattattgttcaacatcctgtggttttatgcttgtgtcaggagttcaaaatggaggggtgggagagggcaacttggaatccaggttggcaaaagaattccttgagacccaaagctgctgcgttgttagatgggtgacataggcacagcgtcaagtgttgcgtggaggttgggaggcggtcctgagctatgccacgcccatgcgtgaaggggtgcggctgcgcagccactgggttttgctggtgcccctcccctccgctctg
Then I tried cat extra/example1.fa | ./cesar /dev/stdin, and here's what I got:
m@MPC:~/code/TOGA/CESAR2.0$ cat extra/example1.fa | ./cesar /dev/stdin
>referenceExon
gCCTGGGAACTTCACCTACCACATCCCTGTCAGTAGTGGCACCCCACTGCACCTCAGCCTGACTCTGCAGATGaa
>mouse
ctttcctcatttcctcaggcttcagtatagcatgaggctgaggaggagagagggagaccggcaaagtggccttgcttaggtaccatctttgcccctttagGCTTGGCAACTTCACCTACCACATCCCTGTCAGCAGCAGCACACCACTGCACCTCAGCCTGACCCTGCAGATGAAgtgagtgctggtgtgtgggtatgtgtgggggaccatgtggaagccctcagaaaagtgaaagccaagtgcttactaaatttattacgtggagggtccaggc
Does anything look unusual here? Any red flags for what might be going on?
@mprincipato
I apologise for the long delay... And also, I would recommend using virtualenv to manage python versions between different projects =).
I should say, there are absolutely 0 red flags in the provided output, which makes everything more complicated.
The part that crashes is /home/m/code/TOGA/CESAR2.0/cesar /dev/stdin -x 3
. It consumes the input directly from stdin, without creating any temporary input files. My guess was that it does not work on WSL, however, it seems like there is another reason for such a behaviour.
I need to add some additional logging and maybe provide an option to create these temporary input files.
Will keep you updated.
Hello, dev team!
I'm very new to TOGA, and I'm trying it out on some public data for a class project.
When I run it on my personal Windows machine (using Windows Subsystem for Linux), it runs for about eight and a half hours before saying:
`### CESAR jobs done ###
Checking whether all CESAR results are complete !!CRITICAL: Too many (4576) CESAR jobs died, please check your input data and re-run TOGA !!CRITICAL: Too many (4576) CESAR jobs died, please check your input data and re-run TOGA Program finished with exit code 1`
When I look in the CESAR job logs, I find several error messages like this: (Slightly edited to remove personal information.)
Here was the command I originally used to run TOGA:
./toga.py ../toga_input/hg38.HLpipKuh2.allfilled.chain supply/hg38.wgEncodeGencodeCompV34.bed ../toga_input/hg38.2bit ../toga_input/HLpipKuh2.2bit --kt -i supply/hg38.wgEncodeGencodeCompV34.isoforms.txt --cesar_mem_limit 5 --cesar_jobs_num 500
Do you have any guesses what might be going wrong to cause the CESAR wrapper command to crash?
As far as my inputs go, I got hg38.HLpipKuh2.allfilled.chain and HLpipKuh2.fa from this Bat1K page, and I converted HLpipKuh2.fa to 2bit using faToTwoBit.