topdownproteomics / sdk

Software solution for common top-down proteomics tasks
http://www.topdownproteomics.org/
MIT License
9 stars 4 forks source link

UofW incorporation #55

Closed rfellers closed 4 years ago

rfellers commented 6 years ago

I took the isotopic generation methods from the UofW IsotopicDistribution class, copied them to FineStructureIsotopicGenerator.cs, and switched everything to use TDP.SDK interfaces/classes. This is the first (maybe only?) major step to incorporate UofW code into the library proper. Are there any other bits that you'd like to consider? I'd like to get to the point where we could completely remove the UWMadison folder (like I did for the Northwestern folder awhile back). No rush, please let's discuss ... I expect more commits before this PR is merged.

acesnik commented 6 years ago

I'm not familiar with this method, but the idea of using the same interfaces sounds good to me! My first thought is that reading a UniProt protein XML should be the next bit of code to consider merging in. @rmillikin, any thoughts? I'll have more time next week to think about what other methods we might want to bring into the SDK.

rfellers commented 6 years ago

Hey @acesnik, I agree that UniProt text and XML parsers would be attractive things to add in the future, but with this PR I'm focusing on removing the UWMadison folder. Those "institution" folders seem like a temporary thing to me that should be phased out. Do you agree? If so, is there anything in there worth grabbing now and incorporating? For example, one thing the IChemicalFormula in the TD.SDK doesn't support is isotopes (e.g. C{13}H2). How important are those for you? ... and do we need that for ProForma support?

acesnik commented 6 years ago

Oh, that makes more sense. Thanks for clarifying. Yes, I agree that the institution folders seem temporary.

I think isotopes are important. Isotopic labeling is a commonly used tool, and at some point we should be able to specify modifications with isotopes. There are examples with isotopes in the ProForma v1 paper.

rmillikin commented 6 years ago

do we want to duplicate mzLib code or simply add mzLib as a nuget package to this project? we do occasionally change how we read databases so we'd have to keep it updated in 2 different places. I know mzLib has a bunch of stuff in it (+1 to splitting it into different, smaller packages) but all the pitfalls of code duplication come to mind...

rfellers commented 6 years ago

@acesnik That's right now that you mention it. I'll look into adding unit tests to cover labeling and see if/how the chemical formula needs to be changed.

@rmillikin I'd vote that we don't take a dependency on mzLib. I was hoping this would be a library that takes the best parts of mzLib and internal Kelleher libraries (and anything else) and merges them all together. A chance to collectively start over if you will ... That said, you are of course correct to be concerned with code duplication across libraries. It is ultimately up to you what you do with mzLib, but I'd be happy to brainstorm about it on a future call.

rfellers commented 4 years ago

Closing this as it is now combined into #71