Closed jmtsuji closed 7 months ago
@jmtsuji, I noticed that you tended to space out some of your loops and statements.
# Get the original (pre-stitched) contig sequence as a SeqRecord
with open(input_fasta_filepath) as fasta_handle:
for record in SeqIO.parse(fasta_handle, 'fasta'):
sequence_names.append(record.name)
if record.name in subset_sequence_ids:
yield record
or
if circlator_info['circularised'].sum() / circlator_info.shape[0] == 1:
logger.debug('Everything is stitched.')
result = True
elif circlator_info['circularised'].sum() >= 0:
logger.debug('Not everything is stitched.')
result = False
else:
logger.error('File processing error. # of non-stitched contigs is not >=0. Exiting...')
raise RuntimeError
When indenting, most developers I know do not put blank lines between each line of code. Instead, they blank lines to break out the code into logical blocks. The extra blanks make the code longer to scroll through, which takes time.
For Example:
There is only a blank line above the if statement to break it out from the rest of the loop. In this case, you could even remove that blank line, and the code would still be readable.
# Get the original (pre-stitched) contig sequence as a SeqRecord
with open(input_fasta_filepath) as fasta_handle:
for record in SeqIO.parse(fasta_handle, 'fasta'):
sequence_names.append(record.name)
if record.name in subset_sequence_ids:
yield record
or
Here, we remove the blank lines from the if elif else clause and break it out from the other code with lines. This makes it look like a switch statement, which it functionally is. As you can see, the indenting makes the code fairly readable, even with the blank lines removed.
circlator_info = pd.read_csv(circlator_logfile, sep='\t')[['#Contig', 'circularised']]
if circlator_info['circularised'].sum() / circlator_info.shape[0] == 1:
logger.debug('Everything is stitched.')
result = True
elif circlator_info['circularised'].sum() >= 0:
logger.debug('Not everything is stitched.')
result = False
else:
logger.error('File processing error. # of non-stitched contigs is not >=0. Exiting...')
raise RuntimeError
return result
@jmtsuji, I noticed that you tended to space out some of your loops and statements.
@LeeBergstrand Thanks for these detailed tips -- they are very helpful! I've tried making my code styling more compact in my new updates to this branch -- let me know your thoughts.
@LeeBergstrand Thanks for your helpful last round of comments! I've made several updates to this PR after receiving your advice from the last code review and from #2. Most previous comments are now addressed (although you'll see a couple discussion threads above that are still active).
As one unexpected update, I ended up completing a basic working version of the rotate.py
module and added it to this PR. (Because I accidentally introduced an incomplete version of this module into #2 as mentioned previously, I figured it would be better to just get a basic working version of the rotate.py
module into this PR before this PR gets merged to main, rather than leave an incomplete version of this module in the code base.) The repair and rotate modules pass end-to-end testing on my end, based on almost the latest commit. I've updated the description at the top of this PR to reflect that rotate
is now included.
Look forward to your review of this updated version -- thanks again for your input.
def generate_bed_file(contig_seqrecord: SeqIO.SeqRecord, bed_filepath: str, length_threshold: int = 100000):
"""
Generates a BED file for a desired region around the ends of a contig.
:param contig_seqrecord: SeqRecord of the contig
:param bed_filepath: desired output filepath for the BED file
:param length_threshold: length (bp) around the contig end to target in the BED file.
Half of this length will be returned around each end of the contig.
:return: writes the BED file to the bed_filepath
"""
contig_name = contig_seqrecord.name
contig_length = len(contig_seqrecord.seq)
logger.debug('Making BED file with end proximity threshold of ' + str(length_threshold))
if contig_length < length_threshold:
logger.warning(f'Contig length ({contig_length}) is less than the supplied length threshold '
f'({length_threshold}), so the BED file will be for the whole contig.')
The :return:
is generally only specified when the function returns a return value (i.e., the function returns a value later assigned to a variable in the calling program). Above and in several other places, you specify return value descriptions for functions that do not return values (i.e., there is no return
at the end of the function). If we decide to do readthedocs down the road, the code for readthedocs parses the contents of the function doc string and turns it into a website. It could get confusing if the documentation starts specifying return values that don't exist. Instead, I would write the text you specify in the return value description to the function description.
def map_long_reads(contig_filepath: str, long_read_filepath: str, output_bam_filepath: str, log_filepath: str,
append_log: bool = True, threads: int = 1, threads_mem_mb: float = 1):
"""
Maps long reads (via minimap2) to contigs and sorts/indexes the resulting BAM file. Writes output_bam_filepath and log_filepath to disk.
"""
The
:return:
is generally only specified when the function returns a return value (i.e., the function returns a value later assigned to a variable in the calling program). Above and in several other places, you specify return value descriptions for functions that do not return values (i.e., there is noreturn
at the end of the function). If we decide to do readthedocs down the road, the code for readthedocs parses the contents of the function doc string and turns it into a website. It could get confusing if the documentation starts specifying return values that don't exist. Instead, I would write the text you specify in the return value description to the function description.
@LeeBergstrand Thanks for the advice! I've now cleaned up the way that return
is described in the docstrings, as you've suggested.
@LeeBergstrand I've implemented the code changes you've suggested and look forward to receiving your updated review. I would be OK to wait on changing the package name until a separate PR, if this is OK with you. Thanks!
@LeeBergstrand I've implemented the code changes you've suggested and look forward to receiving your updated review. I would be OK to wait on changing the package name until a separate PR, if this is OK with you. Thanks!
Changing the name in another PR works for me.
Merging into master as discussed in #14 . Thanks for your help with these improvements @LeeBergstrand !
In this PR, the basic skeleton of the rotary-utils package was set up with two utilities, the
repair
utility and therotate
utility. (Main, where I want to merge this PR, currently just contains the rawrepair.py
file ported from rotary.)Major updates included in this PR:
meta.yml
,pyproject.toml
, andenvironment.yml
files are now present for installation.rotary_utils.py
, was created.repair.py
toutils.py
rotate.py
, was added. This module handles basic operations for rotating circular DNA elements and will be helpful for future code modules I will add. For now, a basic working version is included with a couple stub functions that will be expanded in future. Some functionality fromrotate.py
is used inrepair.py
.rotary-utils repair
androtary-utils rotate
pass end-to-end tests on my machine.@LeeBergstrand Looking forward to review. Once this basic framework for rotary-utils is in place, I can start to add other modules into rotary-utils.