tyjo / coptr

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing
GNU General Public License v3.0
16 stars 5 forks source link

Some more ideas #6

Open Midnighter opened 3 years ago

Midnighter commented 3 years ago

Having used CoPTR for a little bit and looked through the source code briefly, I had a few ideas for changes:

  1. I had a need for building a large index which required me to pass the option to bowtie2. This gave me the idea: What if instead of creating explicit options on coptr that are passed to bowtie2, the argument parsing is changed such that all options after -- are directly given to bowtie2? Or alternatively, all unrecognized options could be passed to bowtie2 if there are no conflicting option names. I think that's more flexible and future proof. So something like:

    coptr map index fastq output -- --threads 10 --mm --seed 1234
  2. Your print functions somewhat re-invent logging levels. It should be quite easy to replace them.

  3. I saw that you use subprocess calls to rm to delete files. Using os.remove would make that platform independent.

Let me know if any of those proposals sound good to you and I can create PRs for them.

tyjo commented 3 years ago

These are all great suggestions, thanks! Please do create PRs if you have time.

Midnighter commented 3 years ago

Cool, I've opened #7 and #8 to address 2. and 3.

I think for the first point, I would like to introduce click and do some further refactoring of the code in general. Not sure yet if I want to invest the time right now but if I do, the changes will be rather extensive.

I was wondering about the PoissonPCA class, is there maybe some code in sklearn that could be used here? It's always nice to include widely used and tested code that is maintained by someone else 😉

tyjo commented 3 years ago

I'll take a look #7 and #8.

I think for the first point, I would like to introduce click and do some further refactoring of the code in general. Not sure yet if I want to invest the time right now but if I do, the changes will be rather extensive.

For the first point, I'm not sure it's worth the effort to swap argparse for click. A simpler (though less robust) solution might be to split command line arguments on -- and pass them to the subprocess call to bowtie2.

I was wondering about the PoissonPCA class, is there maybe some code in sklearn that could be used here?

I'm not aware of an equivalent implementation from sklearn (sklearn has several algorithms for non-negative matrix factorzation, which isn't quite what we're doing here). I'm hesitant to swap out this piece, because the benchmarks from the paper show that it performs reasonably well.

Midnighter commented 3 years ago

Thanks for merging! I wanted to add that it's completely feasible to change the logger names. What I used:

logger = logging.getLogger(__name__)

is just a lazy way to get hierarchical names directly from the module names. You can easily put any name in there although I recommend always starting with "coptr" because that will then be the parent of all loggers in the package. So changing the log level of "coptr" will also affect "coptr.ReadMapper".