Closed lydiayliu closed 11 months ago
Update: it seems like the difference is mainly due to loading Noncoding and AltTranslation? But how come the dramatic time difference from before??
split/CPCG0269/CPCG0269_split4_CodonReassign.fasta
[ 2023-08-13 01:20:49 ] moPepGen summarizeFasta started
[ 2023-08-13 01:20:51 ] Reference indices loaded.
[ 2023-08-13 01:42:15 ] rss=187.8 MiB, vms=769.5 MiB, wallclock=0:21:07.730000, system=0:00:16.280000, cpu_usage=99.7%
split/CPCG0269/CPCG0269_split4_Noncoding.fasta
[ 2023-08-13 01:42:16 ] moPepGen summarizeFasta started
[ 2023-08-13 01:42:17 ] Reference indices loaded.
[ 2023-08-13 01:57:31 ] rss=231.5 MiB, vms=813.1 MiB, wallclock=0:14:23.560000, system=0:00:53.840000, cpu_usage=100.0%
split/CPCG0269/CPCG0269_split4_NotCirc-Noncoding.fasta
[ 2023-08-13 01:57:32 ] moPepGen summarizeFasta started
[ 2023-08-13 01:57:33 ] Reference indices loaded.
[ 2023-08-13 01:59:43 ] rss=186.6 MiB, vms=766.3 MiB, wallclock=0:02:04.970000, system=0:00:07.150000, cpu_usage=99.7%
split/CPCG0269/CPCG0269_split4_NotCirc-circRNA-Noncoding.fasta
[ 2023-08-13 01:59:43 ] moPepGen summarizeFasta started
[ 2023-08-13 01:59:45 ] Reference indices loaded.
[ 2023-08-13 02:00:12 ] rss=169.6 MiB, vms=750.6 MiB, wallclock=0:00:28.610000, system=0:00:01.680000, cpu_usage=99.9%
split/CPCG0269/CPCG0269_split4_NotCirc-circRNA.fasta
[ 2023-08-13 02:00:13 ] moPepGen summarizeFasta started
[ 2023-08-13 02:00:14 ] Reference indices loaded.
[ 2023-08-13 02:00:44 ] rss=173.2 MiB, vms=755 MiB, wallclock=0:00:31.350000, system=0:00:01.690000, cpu_usage=99.8%
split/CPCG0269/CPCG0269_split4_NotCirc.fasta
[ 2023-08-13 02:00:45 ] moPepGen summarizeFasta started
[ 2023-08-13 02:00:46 ] Reference indices loaded.
[ 2023-08-13 02:05:33 ] rss=186 MiB, vms=767.7 MiB, wallclock=0:04:44.770000, system=0:00:04.840000, cpu_usage=100.0%
split/CPCG0269/CPCG0269_split4_SECT-CodonReassign.fasta
[ 2023-08-13 02:05:33 ] moPepGen summarizeFasta started
[ 2023-08-13 02:05:35 ] Reference indices loaded.
[ 2023-08-13 02:06:02 ] rss=169.7 MiB, vms=751.3 MiB, wallclock=0:00:28.430000, system=0:00:01.770000, cpu_usage=99.9%
split/CPCG0269/CPCG0269_split4_SECT.fasta
[ 2023-08-13 02:06:02 ] moPepGen summarizeFasta started
[ 2023-08-13 02:06:04 ] Reference indices loaded.
[ 2023-08-13 02:06:31 ] rss=170 MiB, vms=751.6 MiB, wallclock=0:00:28.360000, system=0:00:01.630000, cpu_usage=99.9%
split/CPCG0269/CPCG0269_split4_circRNA-Noncoding.fasta
[ 2023-08-13 02:06:31 ] moPepGen summarizeFasta started
[ 2023-08-13 02:06:33 ] Reference indices loaded.
[ 2023-08-13 02:07:00 ] rss=170.9 MiB, vms=752.5 MiB, wallclock=0:00:29.020000, system=0:00:01.800000, cpu_usage=99.8%
split/CPCG0269/CPCG0269_split4_circRNA.fasta
[ 2023-08-13 02:07:01 ] moPepGen summarizeFasta started
[ 2023-08-13 02:07:02 ] Reference indices loaded.
[ 2023-08-13 02:10:42 ] rss=183.5 MiB, vms=765.3 MiB, wallclock=0:03:38.200000, system=0:00:04.590000, cpu_usage=100.0%
Turns out the cause is the memory mapped GenomicAnnotationOnDisk
, that transcript information are retreived from disk over and over again. So I get ride of that and only keep the useful information in memory. This should improve it.
Yup super fast now!
I just noticed this, but I've run summarizeFasta by itself on fastas of a variety of different sizes and it seems like taking ~40 minutes to run is the standard.
This is the commend I used
I know CPCG has a lot of GVFs but that doesn't seem to be the reason. It's possible that it is due to loading the noncoding-peptides file but
summarizeFasta
has always loaded the file. Could it be because now it needs to read the reference a lot from disk?I also looked into CCLE, and realized that in the call-noncanonical peptide pipeline, now
summarize_fasta
is the road block?You can see the trace files in this working directory here:
/hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/work/4b/d60d67c5c7dc71b0cc63ae55ecb983/pipeline-meta-call-NonCanonicalPeptide-0.0.1/
Btw didn't realize that
call_variant
is this fast XDAs an example with a CCLE sample run on May 18 (before the GTF memory change):