Closed lydiayliu closed 2 years ago
My first guess that nextflow failed to publish the output files to publishDir
. Not 100% sure, but it could be that I'm using "move" here. I had similar issue before with "move". Maybe changing it to "copy" would resolve it.
soo i'll just run the 3 samples again separately? sad that it was so close to being perfect XD
I still need to run the fasta entry runs, can you fix this before i do that?
I think we can just change it to "copy"
I tried again with these 3 samples, running just the 3 before and after we changed move
to copy
. Both are still going...
yiyangliu@ip-0A125212:~$ squeue -u yiyangliu
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
284 F2 CCLE-mpg yiyangli R 1:31:47 1 CZOHHPCSLURMPOC01-F2-10
287 F2 CCLE-mpg yiyangli R 34:54 1 CZOHHPCSLURMPOC01-F2-15
285 F72 nf-call_ yiyangli R 1:31:33 1 CZOHHPCSLURMPOC01-F72-5
288 F72 nf-call_ yiyangli R 30:24 1 CZOHHPCSLURMPOC01-F72-3
From the log it seems like one of the samples is not being run: /hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/work/96/f58e5126055ad80aa01132d25483e8/.command.log
ACH-000670
is the problematic sample
I'm running MPG callVariant directly on the GVF of this sample
moPepGen callVariant \
--index-dir /hot/users/yiyangliu/MoPepGen/Index/GRCh38-EBI-GENCODE34/ \
-i /hot/users/yiyangliu/MoPepGen/Random/ACH-000670_starfusion.gvf /hot/users/yiyangliu/MoPepGen/Random/ACH-000670_vep_gencode.gvf \
--verbose-level 2 \
--threads 16 \
-o /hot/users/yiyangliu/MoPepGen/Random/ACH-000670_variant_peptides.fasta
It's been 60+ minutes on one of these transcripts, not the most promising XD
[ 2022-05-28 01:28:56 ] ['ENST00000519984.1', 'ENST00000519529.1', 'ENST00000519503.5', 'ENST00000256412.8', 'ENST00000522298.1', 'ENST00000520193.1', 'ENST00000519301.6', 'ENST00000652698.1', 'ENST00000651149.1', 'ENST00000650866.1', 'ENST00000650856.1', 'ENST00000520407.5', 'ENST00000523534.5', 'ENST00000651335.1', 'ENST00000631040.2', 'ENST00000523079.5']
I cancelled my other jobs cuz I think we know where the problem is
Seems like there is a lot of fusions in those transcripts. Those transcripts are probably from the same gene. I'll let keep it running over night to see how it goes.
Still going! Though it used to be using 9 CPUs yesterday and we are down to 8
9a2883c85f7c sweet_shockley 800.13% 15.21GiB / 62.76GiB 24.23
It finished! Took 16 hours lol. Is it something that is worth investigating?
[ 2022-05-28 01:28:56 ] ['ENST00000519984.1', 'ENST00000519529.1', 'ENST00000519503.5', 'ENST00000256412.8', 'ENST00000522298.1', 'ENST00000520193.1', 'ENST00000519301.6', 'ENST
00000652698.1', 'ENST00000651149.1', 'ENST00000650866.1', 'ENST00000650856.1', 'ENST00000520407.5', 'ENST00000523534.5', 'ENST00000651335.1', 'ENST00000631040.2', 'ENST000005230
79.5']
[ 2022-05-28 17:58:06 ] ['ENST00000650919.1', 'ENST00000356819.7', 'ENST00000651807.1', 'ENST00000650967.1', 'ENST00000652588.1', 'ENST00000521670.5', 'ENST00000287842.7', 'ENST
00000650980.1', 'ENST00000405005.7', 'ENST00000651175.1', 'ENST00000650964.1', 'ENST00000520073.5', 'ENST00000523358.5', 'ENST00000523187.5', 'ENST00000518036.5', 'ENST000003281
95.8']
The problem is, here when cleaving a PVGNode, the first cleavage site is first found, cleave it, and look for the next site. The function find_first_cleave_or_stop_site
here is called multiple times. Although it's a generator, but since the node sequence got changed every time, it is then still very inefficient. I'm opening a PR right now.
I'm surprised that the update to 0.5.0 seemed to reveal this, was it because the "pure" fusion was just not considered before?
It's just because the transcript is so big. Most of the time was spent on cleaving the giant node.
I almost didn't notice this. For the meta pipeline run of 22 hours, I have 373 samples that have databases produced. There seems to be one that is hanging for 11 hours?? That accounts for the last 3 samples that need to run
don't really know how to troubleshoot this... Here's the log
/hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/log/CCLE.log
workdir is here
/hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/work/