s-andrews / FastQC

A quality control analysis tool for high throughput sequencing data
GNU General Public License v3.0
425 stars 84 forks source link

Max no. of fastqfiles which can be QC ed. #136

Open Gokula139 opened 2 months ago

Gokula139 commented 2 months ago

Hello,

I have running FASTQC 90 fastq files in non-interactive way. My instance running for a long time, and it seems like not stopping. When I monitor the process, it shows the CPU and network usage was good for 2 hrs. from starting after that the CPU and network usage was negligible. So, to find the reason behind this, I am raising this issue. Please help me to rectify it.

Thanks.

s-andrews commented 2 months ago

To get a more definitive answer to this I'd need to know

Processing 90 files should be fine, and if you have multiple CPUs to throw at the job you can run analyses in parallel by specifying the ---threads option when running the program.

There was a known problem in a previous version of fastqc where if one file out of a large set had a problem then the program wouldn't exit correctly at the end of the processing, so it would appear to still be running even though all of the QC reports (apart from the broken input file) had been generated already, so you might have hit this. This problem was fixed in the latest release though.

When you look in the folder which had the data in it, do you see the HTML files for the individual QC reports?

Gokula139 commented 2 months ago
s-andrews commented 2 months ago

OK so you have the latest version which shouldn't have the problem with the stalling upon failure. If all of the files are there then something else must be causing the program to stay open but without being able to see the output it wrote I really have nothing to go on to try to diagnose this I'm afraid.

Gokula139 commented 2 months ago

Not the html file is generated for all the fastq files. HTML file is created only for 40 and then there is no response from the container. It is running for a long time, and it is not stopping as well.

s-andrews commented 2 months ago

If you've run this as a single-threaded process then you should be able to figure out which file is causing the crash/stall as it would be the next one in the list which didn't get processed. If you can then try to run that file and see what output is generated we can try to track this down. Alternatively if you can share the file which is failing with me then I can run it and see what happens. If the process has been stalled for more than a few minutes with no additional output then there's no point leaving it running so you can kill the process which is there and start again on the problematic file.

Gokula139 commented 2 months ago

Hello,

I am trying to find logs for this process. But I couldn't find the logs, where the logs will be written to? Directly in the console or somewhere. If it is somewhere, could you please tell me where I can find it?

Thanks

Gokula139 commented 2 months ago

Will the issue raise because of memory! what is the default memory the JVM uses here? Do we have any ways to increase it through parameters?

s-andrews commented 2 months ago

If you're using the latest FastQC then it will assign 512MB to each thread you launch. This should be enough for pretty much all libraries, but if you need more then you can increase this using the --mem command line option.

When launching this on a cluster you will need to assign slightly more than 512MB per thread as there will be a memory overhead from the JVM itself. If you assign a few GB that should cause no issues and should be more than enough.

s-andrews commented 2 months ago

I am trying to find logs for this process. But I couldn't find the logs, where the logs will be written to? Directly in the console or somewhere. If it is somewhere, could you please tell me where I can find it?

It really depends how you've run this. FastQC itself just writes this information to stdout and stderr, so it will be wherever you are sending that in the way you launch the program. If you just did:

fastqc mydata.fq.gz

Then it will just print to the terminal in which the process was launched. If you did a redirection such as:

fastqc mydata.fq.gz > log.txt 2>errors.txt

Then the output will go into the log.txt and errors.txt

If you've run this on a cluster with something like:

ssub -o log.txt fastqc mydata.fq.gz

Then it will go into the log file you specified. If you are using slurm and don't specify a log file then it goes to a file in your home directory named after the job id, so something like 1234.o or 1234.e.