Closed wincowgerDEV closed 1 year ago
Sounds great! Let me know if you like any help.
You are welcome to take part and jump in however you would like. Will definitely be posting about challenges on this issue and at mention you if neither of us know what to do too.
Cheers Win
On Mon, May 17, 2021, 1:55 AM Zacharias Steinmetz @.***> wrote:
Sounds great! Let me know if you like any help.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/wincowgerDEV/OpenSpecy/issues/78#issuecomment-842148237, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMUJU2YF4B3V3F2EZTPGSTTODKXXANCNFSM445HKS5Q .
A recent publication made some headway on this problem for us.
They have a github page with the code here: https://github.com/EdsonCilos/mp_classification
It is in python but this will give us some exposure to the format of the models and most can be implemented in R.
Great! I'll take a look and get more involved from next week
Hi guys, Finally I jumped in, to stay! I'm starting to understand the data structure and how to manage the collaborative work/communication through GitHub. I don't know the best place to have this kind of discussion, please feel free to correct and guide me whenever you want. Here I go:
Best, Aline
Hey Aline
Awesome! Glad to have you on board.
Yeah this is the right place to have these convos 🙂
Answers below:
Thats right.
That would be awesome. I know there is another issue we opened where @Shreyas Patankar @.***> Made some headway on that problem so you might start with getting his code implemented.
Definitely! We should include those and we are spectra hunters. The main thing I'm working on right now is overhauling that database with about 5k new spectra we have been given by folks. As you can imagine it's taking some time to get them all formatted together but I'm about a quarter of the way there.
what is the best way to share my code through GitHub?
The best way to share it is to create a new branch of this repo and put the code where it should belong in the repo. I guess that this function could be a function to build a hierarchical clustering algorithm in the open Specy package in addition to the app so we would probably start by implimenting it as a function there. After the code is working, you'll submit a pull request and Zacharias or I will review it and edit and with you back and forth and then it will get implimented in the code after everyone is happy with it. We can set up a video call to walk through some of this if you would like.
That would be sweet! Do you think all reference data would have to have the metadata info for a routine like that to work or could some have it and some not have it? I ask because many of the spectra have poor metadata.
Warm Regards Win
On Thu, Nov 4, 2021, 4:26 AM ardcarvalho @.***> wrote:
Hi guys, Finally I jumped in, to stay! I'm starting to understand the data structure and how to manage the collaborative work/communication through GitHub. I don't know the best place to have this kind of discussion, please feel free to correct and guide me whenever you want. Here I go:
- Supposing the updated OpenSpecy database is avaiable in through get_lib(), we currently have 636 spectra, right?
- The spectrum_identity variable have several duplicate identities, e.g. "poly(ethylene terephthalate" and "poly(ethylene terepthalate". A more standardized classification is conceivable? If so, I could try to label them and reduce this great variability, at least for the purposes of this issue.
- I'm collecting and diving into more specific references, as the last pub you shared, and indeed a lot could be implemented to OpenSpecy and become avaiable to scientific community. I'm more familiar with multivariate analysis than machine learning, but willing to learn. About avaiable spectra, as the ones in ( https://github.com/EdsonCilos/mp_classification), is it possible to include in our database? Should we also work as "spectra hunters"?
- I've started the exploratory analysis with a simple PCA followed by a hierarchical clustering (OpenSpecy_MVA_script.txt https://github.com/wincowgerDEV/OpenSpecy/files/7474226/OpenSpecy_MVA_script.txt
- what is the best way to share my code through GitHub?
- Several other data analysis tools are possible to be implemented through OpenSpecy, such as spectra differentiation through space and time or among experimental conditions. That is, using metadata info as explanatory variables. Happy to listen to your ideas!
Best, Aline
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/wincowgerDEV/OpenSpecy/issues/78#issuecomment-960673235, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMUJU4MLQSJKLC2SQ26SCDUKJUXTANCNFSM445HKS5Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hey!
Great Win, things are becoming clear.
Indeed Shreyas Patankar did a great work on polymers categorization #95. I'll work to adapt it and implement it in OS (https://github.com/Ocean-Wise/OpenSpecy_data_sorting) - but it would be nice to apply to the database that you're working on.
The following step would be towards what was done by Back et al (pub you sent, https://doi.org/10.1016/j.chemosphere.2021.131903): developing the model to better identify an unknown spectra. They propose an interesting pipeline and worked with a dataset of 958 spectra. Nice if we develop with our 5k+ database... But I'll start the code with the what we've now.
But beyond improving the spectrum identification, I seek the implementation of data analysis tools (perhaps it should be another issue). The user would be the one to enter the metadata to perform the analysis (as experimental conditions), so I don't think the metadata of our dataset will be important.
We could definitely chat =) Let's set up a video call later november, give me some time to get more into the whole issue. =p
Best, Aline
Hey Aline,
By the time you have a first implementation of the code running in OS, I should have the new database up and running. I will put this at the top of my priorities.
I agree that the new data analysis tools you are thinking of with experimental conditions as inputs to spectral analysis should probably be a new issue. I will send you an email for the video call for later November soon :)
Let me know if you have any other questions in the meantime.
Warm Regards, Win
On Fri, Nov 5, 2021 at 10:20 AM Aline Carvalho, PhD < @.***> wrote:
Hey!
Great Win, things are becoming clear.
Indeed Shreyas Patankar did a great work on polymers categorization #95 https://github.com/wincowgerDEV/OpenSpecy/issues/95. I'll work to adapt it and implement it in OS ( https://github.com/Ocean-Wise/OpenSpecy_data_sorting) - but it would be nice to apply to the database that you're working on.
The following step would be towards what was done by Back et al (pub you sent, https://doi.org/10.1016/j.chemosphere.2021.131903): developing the model to better identify an unknown spectra. They propose an interesting pipeline and worked with a dataset of 958 spectra. Nice if we develop with our 5k+ database... But I'll start the code with the what we've now.
But beyond improving the spectrum identification, I seek the implementation of data analysis tools (perhaps it should be another issue). The user would be the one to enter the metadata to perform the analysis (as experimental conditions), so I don't think the metadata of our dataset will be important.
We could definitely chat =) Let's set up a video call later november, give me some time to get more into the whole issue. =p
Best, Aline
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/wincowgerDEV/OpenSpecy/issues/78#issuecomment-962075570, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMUJU2NQFHQJGRW53HBGRDUKQG4TANCNFSM445HKS5Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
--
·.¸¸.·´¯
·.´¯·.¸¸.·´¯
ツ ------------------------------Win Cowger, PhD Pronouns: he/him Research Scientist Moore Institute for Plastic Pollution Research
Contact Info
515-298-3869 | @.*** | @Win_OpenData https://twitter.com/Win_OpenData
Hey folks, I recently spoke with Win about getting some hyperspectral data on OpenSpecy. I've built a model for classifying spectra using SIMCA (a PCA based method) which has worked very well. I haven't used SVM but I have colleagues who work with it and I know from the literature it performs very well. If you'd like to chat just let me know, I'll also be sharing my code for the models I work with once I get it cleaned up a bit.
Looking forward to collaborating :)
- I've started the exploratory analysis with a simple PCA followed by a hierarchical clustering (OpenSpecy_MVA_script.txt https://github.com/wincowgerDEV/OpenSpecy/files/7474226/OpenSpecy_MVA_script.txt - what is the best way to share my code through GitHub?
The best way to share it is to create a new branch of this repo and put the code where it should belong in the repo. I guess that this function could be a function to build a hierarchical clustering algorithm in the open Specy package in addition to the app so we would probably start by implimenting it as a function there. After the code is working, you'll submit a pull request and Zacharias or I will review it and edit and with you back and forth and then it will get implimented in the code after everyone is happy with it. We can set up a video call to walk through some of this if you would like.
Glad you're in, @ardcarvalho!
If you like, you can use GitHub Gists for prototyping (https://gist.github.com/). In addition, you can fork our repo and work on a new branch of that fork from within your private account. Once you fleshed it out, you can create a pull request from that fork. That's a little bit better to manage for us than creating branches here.
This is the so called "Fork and Pull Request" workflow; see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork for details.
If you like, you can use GitHub Gists for prototyping (https://gist.github.com/). In addition, you can fork our repo and work on a new branch of that fork from within your private account. Once you fleshed it out, you can create a pull request from that fork. That's a little bit better to manage for us than creating branches here.
This is the so called "Fork and Pull Request" workflow; see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork for details.
Great, thank you @zsteinmetz !
By the time you have a first implementation of the code running in OS, I should have the new database up and running. I will put this at the top of my priorities.
That would be really nice!
I agree that the new data analysis tools you are thinking of with experimental conditions as inputs to spectral analysis should probably be a new issue. I will send you an email for the video call for later November soon :)
Yeah, let's discuss better how to organize both projects.
Let me know if you have any other questions in the meantime.
Thanks, Win! See you soon
Following up on this, the package and app now support multinomial classification 🎉.
Going to close this for now, we can open a new issue when we want to make another push on model dev depending on need/requests from users.
@ardcarvalho and @wincowgerDEV are working on developing a predictive model for identifying spectra, starting with PCA. The end goal is to develop a model which can be used to accurately predict any raw unprocessed spectrum. This model will speed up identification time and allow us to rapidly expand our resources. If we use an interpretable model, we may also be able to better understand which peaks are most important for identification. Ideally, the model accuracy will be greater than 90% which is the current accuracy of our default settings. This product is ripe for publication if we manage to pull it off and could have wide implications beyond Open Specy. The model will eventually be folded into the Open Specy package as a function (as long as the model file size isn't too large) and offered as a feature in the online version of the tool.
Steps
Some other model options that might work: