sjroth / ARTDeco

MIT License
15 stars 7 forks source link

ARTDeco gets stuck generating expression data in FPKMs #8

Closed paulocaldas closed 2 years ago

paulocaldas commented 2 years ago

So far I've been running ARTDeco with almost no problems, but when I have too many samples (in this example 200) ARTDeco gets stuck in the last step (image below). It has been like this for 4 days now ... It's probably due to the number of samples, but it's strange that ARTDeco can still generate the individual FPKM files, but not the all_dogs file.

image

sjroth commented 2 years ago

When reporting issues, can you also include your command? It makes it easier to debug.

This behavior is not strange if you are running out of RAM during this operation. There are other command line options for this that are less memory intensive, though slower, than this. Can you confirm that the individual FPKM files are generated?

paulocaldas commented 2 years ago

This was my command: ARTDeco -home-dir /artdeco_results/ -bam-files-dir /bam_files/ -gtf-file human.modified.annotation.file.gencodev37.gtf -cpu 8 -chrom-sizes-file human.genome.chrom.sizes -min-dog-len 2000 -dog_window 200 -min_dog_coverage 0.2.

and yes, the other files are being generated. Here's the "head" of the file inside the dogs folder image

Should I use more cores in the -cpu parameter or there's a different option to use less memory (even if it is slower) I've working with datasets of 50 to 120 samples until now. Everything went fine. but now with roughly 200 ARTDeco gets stuck.

sjroth commented 2 years ago

I would use more CPUs if possible and try again. Is ARTDeco generating temporary files for expression quantification? And can you monitor the processing to ensure that it is calling HOMER's annotatePeaks function?

paulocaldas commented 2 years ago

at this point (when stuck at that last step) it doesn't seem to be calling anything. Also, the /tmp folder is empty. I hope I'm checking it right. I will try again with more memory now. It will try to create only the files missing, which are two...

image

paulocaldas commented 2 years ago

Hi again @sjroth just for the record, I didn't forget about this, but the computer cluster that I use has been super busy, and I can't use so much memory at once without affecting other people's work. I will still try if using 16 cores can overcome this issue. However, in the meantime I'm having another problem, but I will open a different issue for that.

paulocaldas commented 2 years ago

@sjroth I finally got the chance to use more RAM and run 190 samples at once. ARTDeco worked like a charm! It's odd that ARTDeco still managed to generate all the individual dog files before, except the all_dogs.fpkm (where he got stuck for days) but (much) more RAM was the solution.

sjroth commented 2 years ago

Awesome! I'm very happy that it worked. I'm not surprised memory was the issue as these processes can be leaky when aggregating lots of samples. One day I'll have to benchmark the memory usage in order to inform what resources are demanded.