phac-nml / refseq_masher

Mash MinHash search your nucleotide sequences against a NCBI RefSeq genomes database
Apache License 2.0
39 stars 4 forks source link

Bug Fix for When Mash Output is Empty #4

Closed emarinier closed 3 years ago

emarinier commented 3 years ago

This pull request addresses #3.

I've tried to keep the programming style mostly similar. I handle empty files by returning a None dataframe when the output is empty and then add logic to handle when dataframes are None.

peterk87 commented 3 years ago

Looks good to me!

If it's not too much trouble, the tiny FASTQ file could be added as test data and a test case could be added to tests/test_contains.py. Maybe something like this

def test_tiny_fastq(runner):
    fastq = 'tests/data/tiny.fastq'
    result = runner.invoke(cli.contains, ['-p', '4',
                                          fastq])

    assert result.exit_code == 0, 'Exit code should be 0'
    assert 'There were no matches found.' in result.error, 'Should be a log message indicating no matches were found.'
    assert len(result.output) == 0, 'No output results on empty data' # if that makes sense

I haven't tested the above.

emarinier commented 3 years ago

Sorry, wrote a comment and realized I was wrong about logging. I'm going to try updating the tests.

emarinier commented 3 years ago

Okay, what an adventure, but I found out how to check logs in testing:

    def test_small_fastq(self):
        fastq = 'tests/data/small.fastq'
        runner = CliRunner()

        with self.assertLogs(level='INFO') as cm:
            result = runner.invoke(cli.contains, ['-p', '4', fastq])
            assert "There were no matches found." in " ".join(cm.output)

        assert result.exit_code == 0, 'Exit code should be 0'
        assert len(result.stdout) == 0, 'No output results on empty data'