wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
401 stars 48 forks source link

concatenate the different NanoStats.txt files #359

Closed SueFletcher closed 1 month ago

SueFletcher commented 3 months ago

Hello,

I have individual fastq.gz , each fastq.gz file is a barcode. When I use the classic NanoPlot command, I get an individual folder (containing summary, plots, etc.) for each barcode. Is there a way to concatenate all these NanoStats.txt files into one large table where the columns represent my different barcodes and the rows represent the different metrics?

this is the code I'm using right now :

import os import subprocess

class NanoPlotQualityChecker: def init(self, input_folder, output_folder, num_threads): self.input_folder = input_folder self.output_folder = output_folder self.num_threads = num_threads

def check_quality(self):
    # Create the output folder if it doesn't exist
    if not os.path.exists(self.output_folder):
        os.makedirs(self.output_folder)

    # Get a list of all FASTQ files in the input folder
    fastq_files = [f for f in os.listdir(self.input_folder) if f.endswith('.fastq') or f.endswith('.fastq.gz')]

    # Run NanoPlot for each FASTQ file
    for fastq_file in fastq_files:
        input_path = os.path.join(self.input_folder, fastq_file)
        output_path = os.path.join(self.output_folder, fastq_file.split('.')[0])  # Output path construction
        nanoplot_cmd = f'NanoPlot --fastq {input_path} -o {output_path} --threads {self.num_threads}'
        subprocess.run(nanoplot_cmd, shell=True)

if name == "main": input_folder = '/data/fastq' output_folder = '/data/quality_plots' num_threads = 6 # User to specify the number of threads

nanoplot_checker = NanoPlotQualityChecker(input_folder, output_folder, num_threads)
nanoplot_checker.check_quality()
wdecoster commented 3 months ago

Could you see if NanoComp solves your problem? You can run it with all the fastq files as input, and each will be processed as a separate dataset.