lczech / gappa

A toolkit for analyzing and visualizing phylogenetic (placement) data
GNU General Public License v3.0
56 stars 7 forks source link

allow-file-overwriting flag not working for "examine edpl" #21

Closed bowmanjeffs closed 1 year ago

bowmanjeffs commented 1 year ago

Hello, reporting possible bug: gappa examine edpl --allow-file-overwriting --out-dir test --file-prefix test --jplace-path test/test.jplace

returns the error:

Warning: Output file already exists: test.edpl_list.csv
Found 1 jplace file
Writing output files.
Error: __FUNCTION__: invalid_argument

terminate called after throwing an instance of 'std::invalid_argument'
  what():  __FUNCTION__: invalid_argument
Aborted (core dumped)

It seems that perhaps the command is not recognizing the --allow-file-overwriting flag? An identically structured command for examine assign works as expected.

lczech commented 1 year ago

Hi @bowmanjeffs,

thanks for reporting this!

The --allow-file-overwriting is recognized correctly (at least, it's not the problem here). The problem is in one of the functions for producing the output of the EDPL command. That's why it is working with the examine assign command, but not with the examine edpl command.

Can you please share your test.jplace file with me, so that I can investigate where the error is happening exactly? I'm fairly certain that it is happening because some part of the EDPL computation is producing an empty result, and there is a bug in the code when trying to print the output for an empty input... but I need to check which part exactly is causing that.

Cheers and so long Lucas

bowmanjeffs commented 1 year ago

Lucas, here you go! Thanks for taking a look.

2111SR_20211107_40_16_40m_900mL_S8_L001.12SV5.exp.Bacillariophyta_2836.jplace.zip

lczech commented 1 year ago

@bowmanjeffs,

thank you for sharing the file! I see: your placement file only has a single pquery, with a single placement location, that is, a single branch that it has been placed on. Hence, a measure such as the EDPL, which measures the expected distance between the different placement locations of a pquery, will just output nothing, as there is only a single location, so nothing to compute a distance between.

Hence, for that particular test file, the output is expected to be N/A. Still, that bug should not occur, and instead a proper warning or N/A as output should be produced. I'll fix that soon™ - but in the meantime, I thought it might be useful for you to know that this file does not have a meaningful EDPL :-)

Cheers and so long Lucas

bowmanjeffs commented 1 year ago

Thanks Lucas! This runs in a loop with many library subsets placing to many trees, so that's expected for some of them. I should have anticipated that :). That particular result is captured as NaN by default.

Best, Jeff

lczech commented 1 year ago

Okay, fixed in the latest commit. Instead of the program throwing an exception, it now issues a warning about the situation, but still produces a valid (yet mostly empty) histogram.

Closing the issue for now - feel free to re-open should you have any further trouble with this!