phac-nml / biohansel

Rapidly subtype microbial genomes using single-nucleotide variant (SNV) subtyping schemes
Apache License 2.0
25 stars 7 forks source link

Adds #N/A in subtype field of results when no subtype is found #115

Closed glabbe closed 4 years ago

glabbe commented 4 years ago

This change completes the partial fix from PR #81 , which only handled cases where there were no kmers found. This addition now also handles cases where only negative kmers are found, and no subtype is found (Issue #112 ).

DarianHole commented 4 years ago

There is also the other output results that should have this fixed

https://github.com/phac-nml/biohansel/blob/8bc01519a72fabf0fc1d840e1c0d5a51a4957a65/bio_hansel/main.py#L269

glabbe commented 4 years ago

@peterk87 @DarianHole I did more tests and found problems with creating the Simple summary output after these changes, so this issue is not fully fixed yet. Still working on it.

glabbe commented 4 years ago

Justin found the source of this warning obtained in the command-line: "A value is trying to be set on a copy of a slice from a DataFrame."

The warning goes away when we comment out the lines 56 and 57 in metadata.py, and it did not cause any adverse effect in the test runs that I performed so far: @peterk87 @DarianHole can we remove these two lines from the code?

Code block including the two (lines 56 and 57) in metadata.py:

def merge_metadata_with_summary_results(df_results: pd.DataFrame, df_metadata: pd.DataFrame) -> pd.DataFrame: """Merge subtype metadata table into subtype results table. Args: df_results: Subtyping results table. df_metadata: Subtype metadata table. Returns: Subtyping results with subtype metadata merged in if metadata is present for subtype results. """ (line 56) df_results.subtype = df_results.subtype.fillna('') (line 57) df_results.subtype = df_results.subtype.astype(str) return pd.merge(df_results, df_metadata, how='left', on='subtype')

peterk87 commented 4 years ago

I think it should be safe to remove those lines since any NA values are converted to the string value #N/A in https://github.com/phac-nml/biohansel/blob/e1139898ed3dcd5e5c3931e79d3300a073e1b967/bio_hansel/main.py#L248 prior to calling merge_metadata_with_summary_results

glabbe commented 4 years ago

Will make a new pull request with the changes suggested by Peter and Darian