rotary-genomics / spokewrench

Toolkit for manipulating circular DNA sequence elements
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Implement a function that calls a chain of subcommands piped into each other. #7

Open LeeBergstrand opened 4 months ago

LeeBergstrand commented 4 months ago
          @jmtsuji Here, we might be able to implement a function that calls a chain of subcommands piped into each other.

Also, note that there may be more sophisticated libraries for performing this kind of work, like https://pypi.org/project/sh/, that remove some of the boilerplate. I was initially going to use sh for rotary, but there were some features that were missing so I did not use it. It still might be useful for your code.

_Originally posted by @LeeBergstrand in https://github.com/rotary-genomics/rotary-utils/pull/6#discussion_r1566571848_

LeeBergstrand commented 4 months ago

Previous Code

    with open(log_filepath, write_mode) as logfile_handle:
        with open(output_bam_filepath, 'w') as bam_handle:
            # TODO - add support for different flags like -ax for pacbio
            minimap_args = ['minimap2', '-t', str(threads), '-ax', 'map-ont', contig_filepath, long_read_filepath]
            minimap = run_pipeline_subcommand(command_args=minimap_args, stdout=subprocess.PIPE, stderr=logfile_handle)

            samtools_view_args = ['samtools', 'view', '-b', '-@', str(threads)]
            samtools_view = run_pipeline_subcommand(command_args=samtools_view_args, stdin=minimap,
                                                    stdout=subprocess.PIPE, stderr=logfile_handle)

            samtools_sort_args = ['samtools', 'sort', '-@', str(threads), '-m', f'{threads_mem_mb}M']
            run_pipeline_subcommand(command_args=samtools_sort_args, stdin=samtools_view, stdout=bam_handle,
                                    stderr=logfile_handle)

        samtools_index_args = ['samtools', 'index', '-@', str(threads), output_bam_filepath]
        run_pipeline_subcommand(command_args=samtools_index_args, stderr=logfile_handle)

The code above could be written as:

    with open(log_filepath, write_mode) as logfile_handle:
        with open(output_bam_filepath, 'w') as bam_handle:
            # TODO - add support for different flags like -ax for pacbio
            minimap_args = ['minimap2', '-t', str(threads), '-ax', 'map-ont', contig_filepath, long_read_filepath]
            samtools_view_args = ['samtools', 'view', '-b', '-@', str(threads)]
            samtools_sort_args = ['samtools', 'sort', '-@', str(threads), '-m', f'{threads_mem_mb}M']

            run_chained_subcommand(commands=[minimap_args, samtools_view_args, samtools_sort_args], stdout=bam_handle,  stderr=logfile_handle)

        samtools_index_args = ['samtools', 'index', '-@', str(threads), output_bam_filepath]
        run_pipeline_subcommand(command_args=samtools_index_args, stderr=logfile_handle)
LeeBergstrand commented 4 months ago

@jmtsuji Does this make sense to you? All you would need to do is have a function called run_chained_subcommand that loops through all the subcommand lists and pipes the output to the next subcommand in the chain. This removes all the calls to run_pipeline_subcommand, which are redundant, and cuts the code used in half.

LeeBergstrand commented 4 months ago

@jmtsuji This code could also be used for subset_reads_from_bam

LeeBergstrand commented 4 months ago

@jmtsuji I would recommend that we implement this before we implement any other external commands that use chained pipeing.

jmtsuji commented 4 months ago

@LeeBergstrand Thanks for making this issue and for the clarification. What you've proposed sounds quite feasible for adding a function that chains subcommands. I can revisit this and implement it before any other external commands with pipes are added. Also worthwhile to look into an alternative library like sh at that point.

LeeBergstrand commented 4 months ago

@LeeBergstrand Thanks for making this issue and for the clarification. What you've proposed sounds quite feasible for adding a function that chains subcommands. I can revisit this and implement it before any other external commands with pipes are added. Also worthwhile to look into an alternative library like sh at that point.

Thanks this sounds good to me!