Closed editorialbot closed 1 year ago
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Software report:
github.com/AlDanial/cloc v 1.88 T=0.17 s (736.8 files/s, 57502.6 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
TypeScript 37 193 335 2062
Vuejs Component 23 40 35 1878
Jupyter Notebook 4 0 1407 758
Python 3 90 101 609
YAML 11 57 30 501
JSON 11 2 0 307
Markdown 9 114 0 279
Bourne Shell 6 29 9 248
TeX 1 14 0 182
make 5 22 14 74
SVG 1 0 0 56
JavaScript 6 0 6 53
HTML 2 4 2 36
Dockerfile 2 8 1 24
XML 1 0 0 9
Sass 1 1 2 8
-------------------------------------------------------------------------------
SUM: 123 574 1942 7084
-------------------------------------------------------------------------------
gitinspector failed to run statistical information for the repository
Wordcount for paper.md
is 1335
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.5281/zenodo.4154370 is OK
- 10.48550/ARXIV.2003.06975 is OK
MISSING DOIs
- 10.1186/s43591-022-00035-1 may be a valid DOI for title: Trash Taxonomy Tool: harmonizing classification systems used to describe trash in environments
- 10.1186/s40965-018-0050-y may be a valid DOI for title: OpenLitterMap.com – Open Data on Plastic Pollution with Blockchain Rewards (Littercoin)
INVALID DOIs
- https://doi.org/10.1029/2019EA000960 is INVALID because of 'https://doi.org/' prefix
- https://doi.org/10.1016/j.wasman.2021.12.001 is INVALID because of 'https://doi.org/' prefix
@domna, @luxaritas – This is the review thread for the paper. All of our communications will happen here from now on.
Please read the "Reviewer instructions & questions" in the first comment above. Please create your checklist typing:
@editorialbot generate my checklist
As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.
The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/5136
so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.
We aim for the review process to be completed within about 4-6 weeks but please make a start well ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Just finished an initial pass looking over this, and wanted to share a few high-level comments (not submitting issues at this point because I'm not sure how much is actionable in that context, but I'm happy to for anything where it would be useful):
I'm on the fence about whether this qualifies under the JOSS substantivity requirement. On one hand, I'm definitely on board with the value of having an intuitive interface to do this work. On the other hand, I'm not fully sold on the value as currently implemented: The current service lacks substantial "workflow" tools (eg, some sort of projects/tagging, multiple collaborators, additional analytics, ...) and some level of polish/UX work to the degree that it almost feels like a Google Colab notebook could serve the purpose more efficiently (point it to a folder of images, run the nodebook, get some summary stats back out).
It also seems like the cloud deployment approach here doesn't provide much value for someone wanting to deploy it on their own - this seems like a great candidate to deploy to something like GitHub pages, Netlify, etc whereas the current approach is pretty complex for what it is. The only thing that prevents it from being deployed in that way is the collection of images to feed back to the model, which hasn't actually been followed through on yet. Even with the current structure, I'll also note that the deployment scripts and such could probably be simplified and consolidated a bit - when taking a brief look at it, it felt like I had to jump around a bit to find all the different bits and pieces and had a lot of different calls between different tools (eg, why use makefiles vs just having a single docker-compose in the root and use docker-compose up
?).
Additionally, while I know the model isn't yours, when I sampled a couple images, it seems like the model is sufficiently unreliable at least to me that I'm not confident I would trust it to do the analysis for me as a researcher. I also have concerns about whether it would be fast enough just running locally to analyze the amount of images that would be needed in a research context (though not being a subject matter expert, I can't make that judgement with any real confidence).
To be clear I definitely think this has the promise to be really useful, but I'm not sure it hits the mark as-is.
Hey @luxaritas, Thanks for the initial pass. I am creating some issues now based on these helpful comments and have responses below. Will update you when the issues are fixed.
It does not appear the submitting author (@wincowgerDEV) made significant contributions to the software itself - the only commits I see attributed are the paper and a minor tweak to deployment configurations
Joss submission guidelines require that the submitting author is a major contributor of the software: https://joss.readthedocs.io/en/latest/submitting.html You are correct that I have not contributed directly to the source code. However, there are many ways to contribute to software development beyond contributing to source code. On any software team, there are typically testers, documenters, project managers, source code developers, and many other roles. I have contributed over 100 hours to this project by contributing to documentation, testing, conceiving of the project, and planning the project. I also lead the writing of the manuscript. Additionally, the corresponding author @shollingsworth, is the primary contributor to the source code.
I'm on the fence about whether this qualifies under the JOSS substantivity requirement. On one hand, I'm definitely on board with the value of having an intuitive interface to do this work. On the other hand, I'm not fully sold on the value as currently implemented: The current service lacks substantial "workflow" tools (eg, some sort of projects/tagging, multiple collaborators, additional analytics, ...) and some level of polish/UX work to the degree that it almost feels like a Google Colab notebook could serve the purpose more efficiently (point it to a folder of images, run the nodebook, get some summary stats back out).
**I disagree with this comment. Most researchers in my field would not know how to use a Google Colab notebook to do this work. The fact that computer vision is almost solely run using programming languages is preventing its adoption in trash research, this is the premise of our work. I also am not aware of any needs they would have for these additional workflow tools that you are mentioning but would be happy to consider them if you can provide some specifics.
During the prereview some of these questions came up from @arfon as well and I had additional responses there: https://github.com/openjournals/joss-reviews/issues/5005
Please refer to the JOSS Submission guidelines here for the substantiative requirement and consider our in-line responses to why we should be considered meeting it: https://joss.readthedocs.io/en/latest/submitting.html. The guidelines state these requirements: Age of software (is this a well-established software project) / length of commit history.
It also seems like the cloud deployment approach here doesn't provide much value for someone wanting to deploy it on their own - this seems like a great candidate to deploy to something like GitHub pages, Netlify, etc whereas the current approach is pretty complex for what it is. The only thing that prevents it from being deployed in that way is the collection of images to feed back to the model, which hasn't actually been followed through on yet. Even with the current structure, I'll also note that the deployment scripts and such could probably be simplified and consolidated a bit - when taking a brief look at it, it felt like I had to jump around a bit to find all the different bits and pieces and had a lot of different calls between different tools (eg, why use makefiles vs just having a single docker-compose in the root and use docker-compose up?).
Thanks for this comment, I just created an issue for the second part of this comment: https://github.com/code4sac/trash-ai/issues/106. For the first part, I do not believe that deployment to proprietary infrastructures is within the primary scope of JOSS: https://joss.readthedocs.io/en/latest/submitting.html#submissions-using-proprietary-languages-development-environments. We made a local deployment option: https://github.com/code4sac/trash-ai/blob/production/docs/localdev.md which uses a simple docker-compose deployment. Have you tried to use this?
Additionally, while I know the model isn't yours, when I sampled a couple images, it seems like the model is sufficiently unreliable at least to me that I'm not confident I would trust it to do the analysis for me as a researcher. I also have concerns about whether it would be fast enough just running locally to analyze the amount of images that would be needed in a research context (though not being a subject matter expert, I can't make that judgement with any real confidence).
We did create the model and are working on it to improve it. However, the model itself isn't within the scope of a JOSS review to my knowledge, it would need to be reviewed by a machine learning journal. We are aware that it can still be improved in accuracy and that was mentioned in the video. One of the main challenges with improving its accuracy is a lack of labeled images from the diversity of possible settings that trash can be in. We are hoping that people will use the platform and share images to it so that we can relabel them and improve the model in the long term. Have you attempted to use the local deployment option? It is extremely fast and runs asynchronously so it can be run in the background.
To be clear I definitely think this has the promise to be really useful, but I'm not sure it hits the mark as-is.
Thanks for the kind words.
While the video walkthrough is great, it would probably be best to have a bit more written instruction on usage in the README/on the website. I think the setup documentation could also be improved a bit. The local setup seems to refer to specifics of WSL a lot where it could be more system-agnostic, and while the usage of GH workflows is novel, the AWS deployment instructions are a bit nonstandard in that regard plus seem to assume that you're deploying it from the repo itself/as a repo owner.
Thanks for the kind words, I created this issue to correct these issues: https://github.com/code4sac/trash-ai/issues/107
There doesn't seem to be any automated tests. While the desired behavior here is relatively clear/trivial, there's no formal instructions to verify behavior.
Created this issue: https://github.com/code4sac/trash-ai/issues/108, there are many automated tests baked into the docker workflow but you are correct that they should be more clear in the documentation and there should be a formal validation proceedure.
As tagged by editorialbot, it looks like there are some citation formatting issues
Thanks for pointing that out. Created this issue to resolve them: https://github.com/code4sac/trash-ai/issues/109
Thanks for the clarifications @wincowgerDEV!
You are correct that I have not contributed directly to the source code. However, there are many ways to contribute to software development beyond contributing to source code. On any software team, there are typically testers, documenters, project managers, source code developers, and many other roles. I have contributed over 100 hours to this project by contributing to documentation, testing, conceiving of the project, and planning the project. I also lead the writing of the manuscript.
Understood - thanks for the additional detail to help verify.
I disagree with this comment. Most researchers in my field would not know how to use a Google Colab notebook to do this work. The fact that computer vision is almost solely run using programming languages is preventing its adoption in trash research, this is the premise of our work. I also am not aware of any needs they would have for these additional workflow tools that you are mentioning but would be happy to consider them if you can provide some specifics.
My perspective on this is limited due to my limited knowledge of this field, so I appreciate your perspective. I will note that it should be possible to use a Colab notebook without actually having to be familiar with the programming itself, though I recognize that it's not the most user-friendly experience (and on investigating the approaches I had in mind, I think it's probably not as good an option as I thought!).
Thanks for this comment, I just created an issue for the second part of this comment: https://github.com/code4sac/trash-ai/issues/106. For the first part, I do not believe that deployment to proprietary infrastructures is within the primary scope of JOSS: https://joss.readthedocs.io/en/latest/submitting.html#submissions-using-proprietary-languages-development-environments. We made a local deployment option: https://github.com/code4sac/trash-ai/blob/production/docs/localdev.md which uses a simple docker-compose deployment. Have you tried to use this?
Yes, I have, and it works great. However, I still think this is at least somewhat relevant. In order for this package to be useful to non-technical researchers, it has to be deployed. That can either be serviced by your own hosted instance (this is great, and I think is a real value add based on your statement of need, though if I incorporate that into my consideration of amount-of-work then I should include it as at least some small component of the review), or by having it self-deployed (in which case, someone else needs to be able to have a clear way to host it, whereas right now your options are either 1) what is designed to be a local development environment or 2) a cloud-based deployment workflow which is hard to adapt for third-party use (and in both cases has a bunch of components that a typical install wouldn't really need). Additionally, I would consider ease of self-deployment as part of a high-quality open-source package, and if you include this as part of your primary documentation (as you do in the README) I would consider it as something which should be functional and understandable for an end-user (rather than an internal detail of the repo).
We did create the model and are working on it to improve it. However, the model itself isn't within the scope of a JOSS review to my knowledge, it would need to be reviewed by a machine learning journal. We are aware that it can still be improved in accuracy and that was mentioned in the video. One of the main challenges with improving its accuracy is a lack of labeled images from the diversity of possible settings that trash can be in. We are hoping that people will use the platform and share images to it so that we can relabel them and improve the model in the long term.
Ah, I misunderstood what I read previously - I see now.
While the model itself is not directly within the scope of JOSS, I think it is still somewhat relevant - without a usable model, the value of your application to researchers in the way your describe can't really be realized. I don't intend for this to be a critical component to the review itself, but moreso an additional data point to other aspects of the review. This is especially true as because you've trained the model yourself, the model you're providing is part of the product being delivered (and so has additional implications on the "functionality" requirements).
Have you attempted to use the local deployment option? It is extremely fast and runs asynchronously so it can be run in the background.
I was just basing this off of what I saw in the demo video, as I don't have a lot of sample data to work with myself, so my comment on performance is somewhat speculative - again I don't have knowledge of how this would be used in the wild, so my intent was to flag that as something I could see as being an issue, but don't have the knowledge to verify myself. If you've found the performance to be sufficient with your understanding of the use cases, I don't have any real concern.
@luxaritas Thanks for the thoughtful response back. Some responses below to follow up. We will get to work on these aspects and circle back when they are ready.
My perspective on this is limited due to my limited knowledge of this field, so I appreciate your perspective. I will note that it should be possible to use a Colab notebook without actually having to be familiar with the programming itself, though I recognize that it's not the most user-friendly experience (and on investigating the approaches I had in mind, I think it's probably not as good an option as I thought!).
I completely agree that it is possible to do all of this within a Colab notebook or some other programmable interface. Will share with the group to think some more about how we can better integrate this application with programmable interfaces. I think it will have to be part of a longer term development timeline though since several other workflows exist.
Yes, I have, and it works great. However, I still think this is at least somewhat relevant. In order for this package to be useful to non-technical researchers, it has to be deployed. That can either be serviced by your own hosted instance (this is great, and I think is a real value add based on your statement of need, though if I incorporate that into my consideration of amount-of-work then I should include it as at least some small component of the review), or by having it self-deployed (in which case, someone else needs to be able to have a clear way to host it, whereas right now your options are either 1) what is designed to be a local development environment or 2) a cloud-based deployment workflow which is hard to adapt for third-party use (and in both cases has a bunch of components that a typical install wouldn't really need). Additionally, I would consider ease of self-deployment as part of a high-quality open-source package, and if you include this as part of your primary documentation (as you do in the README) I would consider it as something which should be functional and understandable for an end-user (rather than an internal detail of the repo).
Happy for any comments you have about the remote hosting options and we will do our best to incorporate them. Agreed that this is the option that most developers will want to use. Absolutely agree that the primary documentation should lead to function and understandable self deployment. If you end up running into any issues related to that please let us know and we will fix them.
While the model itself is not directly within the scope of JOSS, I think it is still somewhat relevant - without a usable model, the value of your application to researchers in the way your describe can't really be realized. I don't intend for this to be a critical component to the review itself, but moreso an additional data point to other aspects of the review. This is especially true as because you've trained the model yourself, the model you're providing is part of the product being delivered (and so has additional implications on the "functionality" requirements).
Agreed here too! It is our highest priority for development right now to improve the model. Definitely, if the model accuracy is not leading the functional use of the tool then its an issue we should resolve in this tool and I don't think that the only resolution is to improve the model accuracy. We currently report the model confidence in the labels which I believe helps the user to interpret the accuracy. Perhaps there is another aspect of the software you think would be helpful to add for a user who is facing functionality-related issues?
I was just basing this off of what I saw in the demo video, as I don't have a lot of sample data to work with myself, so my comment on performance is somewhat speculative - again I don't have knowledge of how this would be used in the wild, so my intent was to flag that as something I could see as being an issue, but don't have the knowledge to verify myself. If you've found the performance to be sufficient with your understanding of the use cases, I don't have any real concern.
Appreciate your flexibility on this point.
Hi @wincowgerDEV and @luxaritas ,
thank you for the detailed discussion. It already addressed a lot of points I had while reviewing the software. However, I'd like to add my view to a few points and add some additional thoughts.
I agree with @luxaritas view on the deployment, that the software should be easily deployable to any hosting solution or local deployment. While you provide a localdev
option I see it, however, as an option for doing local development for an aws based system instead of as a full-fledged deployment solution. From my view the localstack containers and backend upload options are not really necessary for a local deployment.
What I would find nice to have would be:
While it is stated in the paper that some of the images are uploaded to an S3 storage there is no information about it on your trashai.org webpage. I think it is at least necessary to put a disclaimer to users that their image data is uploaded and that images may contain delicate information through exif data (because I think the targeted audience is not always aware of what information images may contain). In my view it would even be better to have an opt-out option for image upload and/or stripping unnecessary parts of the exif data while uploading and/or informing users of uploading process to aws while also putting information resources to exif data directly on the page.
I find your video tutorial very nice and good to follow. Additionally, I think it would be nice if you'd also have a folder of example data with which new users could directly follow your tutorial. Maybe even with a post-processing example of json data.
I agree with @luxaritas that the written documentation could be expanded. What I miss the most is a detailed explanation of the json data structure for post-processing software. Which fields are expected and what information do they contain? Are there fields which are not available for all data. As I am not a familiar with trash research I was also wondering whether there is some standardized format for data exchange on trash location data and labelling which could be used here (I think the Hapich et al 2022 paper is elaborating on this, so it would be nice to read some more details of the connection of this trash taxonomy to the data format of trashai). Also I understand that your targeted audience is less technical so dealing with json data may be a barrier. I think it would also be good to offer a pdf based overview (like an analysis page which could just be printed into a pdf - so users could directly have a map overview of their trash expedition).
I was missing information on which software people typically use for researching trash images. Since I'm not familiar with the field I may not be aware that such software is not really available, but I think it would be useful to give a glimpse into the working process around your tool (like: what do I actually do with the downloaded json data?). It would also be interesting to know whether there are some widely known databases where you can put your trash data, as I think the useful part of such information is putting a lot of data from different researchers together to give a broad overview of littering places. I think especially it would be nice to have some guidance what to do with the json data downloaded by your tool. Is there some analysis software for that?
I was also very happy to read that you plan to upload your data to the taco dataset. Do you also plan on adding a feature where users can directly annotate their data in trashai and then upload the annotation together with the images to trashai. I think this could be a powerful tool of advancing the taco dataset, since it would target the erroneous classifications of the model directly.
This is just more out of curiosity for your future plans. I find your software a super helpful tool for annotation. However, I was wondering if you plan to bring the data of different sources together. As you state in your paper such information is often used for policy decisions and I think the approach is really useful when data from different places come together. So going in the direction of having a database for trash researchers where data can be easily uploaded from trashai and policy makers can search for locations to get information in this area could be extremely helpful. Are you already thinking in this direction?
I completely agree that it is possible to do all of this within a Colab notebook or some other programmable interface. Will share with the group to think some more about how we can better integrate this application with programmable interfaces. I think it will have to be part of a longer term development timeline though since several other workflows exist.
To be clear, I'm not necessarily advocating for this as an addition to the GUI - the reason I brought it up was more around trying to figure out the value proposition of the UI itself and whether it was providing something that couldn't be done in a simpler way, and it sounds like the UI is important to make CV tools accessible to researchers. Integration with programable interfaces could still be valuable though (at the very least, I'd imagine it would be a good idea for the model to be available directly and not just via UI).
Agreed here too! It is our highest priority for development right now to improve the model. Definitely, if the model accuracy is not leading the functional use of the tool then its an issue we should resolve in this tool and I don't think that the only resolution is to improve the model accuracy. We currently report the model confidence in the labels which I believe helps the user to interpret the accuracy. Perhaps there is another aspect of the software you think would be helpful to add for a user who is facing functionality-related issues?
The confidence labels are definitely a great idea. At some level though model accuracy itself is a barrier to usefulness of this tool - no matter what else you do, if you have to manually re-tag all the images anyways it's not helping automate the process which is the whole point... Unfortunately it's a weird situation where the model itself isn't the primary thing under scrutiny for JOSS, but it's a critical part of the application's functionality
@domna Thanks so much for the comments and thorough review. Some responses in line below.
I agree with @luxaritas view on the deployment, that the software should be easily deployable to any hosting solution or local deployment. While you provide a localdev option I see it, however, as an option for doing local development for an aws based system instead of as a full-fledged deployment solution. From my view the localstack containers and backend upload options are not really necessary for a local deployment. What I would find nice to have would be:
Documentation on how to compile the front-end and directly upload it to any hosting solution (ideally with disabled backend upload option, so you really only need a webspace or even could use gh pages).
- Could you provide an example of an application that has this kind of functionality? I am not super familiar with it but definitely interested in figuring out how we can do it.
A single docker container which contains the compiled gui so admins can easily deploy in a docker based system. It would also be nice if this container would be uploaded to gh packages or dockerhub, so you could just provide a docker-compose file people could use to deploy the service.
- Is this something different than what we already have for docker-compose deployment? https://github.com/code4sac/trash-ai/blob/production/docs/localdev.md#option-2-using-the-shell
Privacy
While it is stated in the paper that some of the images are uploaded to an S3 storage there is no information about it on your trashai.org webpage. I think it is at least necessary to put a disclaimer to users that their image data is uploaded and that images may contain delicate information through exif data (because I think the targeted audience is not always aware of what information images may contain). In my view it would even be better to have an opt-out option for image upload and/or stripping unnecessary parts of the exif data while uploading and/or informing users of uploading process to aws while also putting information resources to exif data directly on the page.
- Agree with you 100% on this. Just made an issue to fix it. https://github.com/code4sac/trash-ai/issues/122
Example data / Tutorial
I find your video tutorial very nice and good to follow. Additionally, I think it would be nice if you'd also have a folder of example data with which new users could directly follow your tutorial. Maybe even with a post-processing example of json data.
- Thanks for the kind words. Definitely agree here too and have made this issue to deal with it. https://github.com/code4sac/trash-ai/issues/123
Documentation and data structure
I agree with @luxaritas that the written documentation could be expanded. What I miss the most is a detailed explanation of the json data structure for post-processing software. Which fields are expected and what information do they contain? Are there fields which are not available for all data. As I am not a familiar with trash research I was also wondering whether there is some standardized format for data exchange on trash location data and labelling which could be used here (I think the Hapich et al 2022 paper is elaborating on this, so it would be nice to read some more details of the connection of this trash taxonomy to the data format of trashai). Also I understand that your targeted audience is less technical so dealing with json data may be a barrier. I think it would also be good to offer a pdf based overview (like an analysis page which could just be printed into a pdf - so users could directly have a map overview of their trash expedition).
- Thanks for this recommendation. It is extremely helpful and I have made an issue for it here: https://github.com/code4sac/trash-ai/issues/124
Reference to other software
I was missing information on which software people typically use for researching trash images. Since I'm not familiar with the field I may not be aware that such software is not really available, but I think it would be useful to give a glimpse into the working process around your tool (like: what do I actually do with the downloaded json data?). It would also be interesting to know whether there are some widely known databases where you can put your trash data, as I think the useful part of such information is putting a lot of data from different researchers together to give a broad overview of littering places. I think especially it would be nice to have some guidance what to do with the json data downloaded by your tool. Is there some analysis software for that?
- Great idea! Added in these here: https://github.com/code4sac/trash-ai/issues/124
I was also very happy to read that you plan to upload your data to the taco dataset. Do you also plan on adding a feature where users can directly annotate their data in trashai and then upload the annotation together with the images to trashai. I think this could be a powerful tool of advancing the taco dataset, since it would target the erroneous classifications of the model directly.
- Thanks for the kind words. We do plan on adding in that feature and are actively working on it. Glad to hear you also think it will be useful.
Future of the software
This is just more out of curiosity for your future plans. I find your software a super helpful tool for annotation. However, I was wondering if you plan to bring the data of different sources together. As you state in your paper such information is often used for policy decisions and I think the approach is really useful when data from different places come together. So going in the direction of having a database for trash researchers where data can be easily uploaded from trashai and policy makers can search for locations to get information in this area could be extremely helpful. Are you already thinking in this direction?
- Love this vision you are sharing. We don't currently have a central repository for trash images and historically image data hasn't been shared very much in our field even though they have been used in many studies. I will think on this some more to consider how best to integrate the tools. We have a portal we have been developing for sharing images and other data in a standardized format of microplastics wincowger.shinyapps.io/validator. It could be used for trash images and data as well but we haven't extended the schema for it yet. Will think about this for the long term roadmap of the tool for sure.
@luxaritas Thanks for the follow up clarification.
To be clear, I'm not necessarily advocating for this as an addition to the GUI - the reason I brought it up was more around trying to figure out the value proposition of the UI itself and whether it was providing something that couldn't be done in a simpler way, and it sounds like the UI is important to make CV tools accessible to researchers. Integration with programable interfaces could still be valuable though (at the very least, I'd imagine it would be a good idea for the model to be available directly and not just via UI).
- I definitely agree with you that some people will prefer something like Colab. Also super good idea to add documentation about where the model is and how people can use it without the tool. Added an issue for it. https://github.com/code4sac/trash-ai/issues/125
The confidence labels are definitely a great idea. At some level though model accuracy itself is a barrier to usefulness of this tool - no matter what else you do, if you have to manually re-tag all the images anyways it's not helping automate the process which is the whole point... Unfortunately it's a weird situation where the model itself isn't the primary thing under scrutiny for JOSS, but it's a critical part of the application's functionality
- I agree with you that accuracy is a barrier to usability. It is going to take us a long time to build a general trash ai but I don't think that precludes us from getting this publication out and allowing people to begin using this tool. It is actually already being used by state agencies around California and a country wide trash survey: https://www.notracetrails.com/trashteam. In the end, it is a double edged sword. We need people to use the tool so that we can get the data we need to improve it but they are less likely to use it until it performs better. In the long term, we expect it to be a standard in trash research.
@wincowgerDEV Thank you for the detailed response and opening of issues.
Documentation on how to compile the front-end and directly upload it to any hosting solution (ideally with disabled backend upload option, so you really only need a webspace or even could use gh pages).
- Could you provide an example of an application that has this kind of functionality? I am not super familiar with it but definitely interested in figuring out how we can do it.
I think I explained this point a little over-complicatedly. I just meant that it would be good to have a documentation on how to generate a static javascript bundle and upload it to a self-hosted webspace (e.g. locally for people wanting to use this model in an intranet or so). Specifically, this means having something like
cd frontend
yarn build
cp dist <your-webspace-folder>
in the documentation. Regarding the disabled backend I've seen that it is possible to just use the frontend without any issues while testing this. For gh pages it would just mean that you have an automatic gh action to generate a webpage which would then compile the js and copy it. You can simply use this action here https://github.com/marketplace/actions/github-pages-action and add your yarn build step. I think it should just work. The benefit would mainly be for development purposes, so people cloning the repo could immediately rollout the changes through this action.
A single docker container which contains the compiled gui so admins can easily deploy in a docker based system. It would also be nice if this container would be uploaded to gh packages or dockerhub, so you could just provide a docker-compose file people could use to deploy the service.
- Is this something different than what we already have for docker-compose deployment? https://github.com/code4sac/trash-ai/blob/production/docs/localdev.md#option-2-using-the-shell
Yes, you directly build in the docker-compose file, so people need to clone the repository. If you would upload a pre-compiled container people could just download a docker-compose file and just run it from there. The container would be downloaded automatically from the repository and started.
Also your localdev
really is an local development option and is not targeted for deployment. My idea was just to build one lean container which just serves the frontend so people can quickly run it on their desktop machines with docker desktop or on a local server with docker.
A simple Dockerfile
could look like this:
FROM node:16.12.0 as builder
COPY . .
RUN yarn
RUN yarn build
FROM nginx:1.23.3
COPY --from=builder dist /usr/share/nginx/html
inside the frontend dir. This would just create an nginx based container with the frontend copied as static files into its public directory.
- Agree with you 100% on this. Just made an issue to fix it. [Feature]: Need to be more explicit about data sharing code4sac/trash-ai#122
👍️
Example data / Tutorial
I find your video tutorial very nice and good to follow. Additionally, I think it would be nice if you'd also have a folder of example data with which new users could directly follow your tutorial. Maybe even with a post-processing example of json data.
- Thanks for the kind words. Definitely agree here too and have made this issue to deal with it. [Feature]: Add example dataset for users to upload to try to tool and to inspect the json output code4sac/trash-ai#123
👍️
Documentation and data structure
I agree with @luxaritas that the written documentation could be expanded. What I miss the most is a detailed explanation of the json data structure for post-processing software. Which fields are expected and what information do they contain? Are there fields which are not available for all data. As I am not a familiar with trash research I was also wondering whether there is some standardized format for data exchange on trash location data and labelling which could be used here (I think the Hapich et al 2022 paper is elaborating on this, so it would be nice to read some more details of the connection of this trash taxonomy to the data format of trashai). Also I understand that your targeted audience is less technical so dealing with json data may be a barrier. I think it would also be good to offer a pdf based overview (like an analysis page which could just be printed into a pdf - so users could directly have a map overview of their trash expedition).
- Thanks for this recommendation. It is extremely helpful and I have made an issue for it here: [Feature]: Data Structure Update code4sac/trash-ai#124
👍️
Reference to other software
I was missing information on which software people typically use for researching trash images. Since I'm not familiar with the field I may not be aware that such software is not really available, but I think it would be useful to give a glimpse into the working process around your tool (like: what do I actually do with the downloaded json data?). It would also be interesting to know whether there are some widely known databases where you can put your trash data, as I think the useful part of such information is putting a lot of data from different researchers together to give a broad overview of littering places. I think especially it would be nice to have some guidance what to do with the json data downloaded by your tool. Is there some analysis software for that?
- Great idea! Added in these here: [Feature]: Data Structure Update code4sac/trash-ai#124
👍️
I was also very happy to read that you plan to upload your data to the taco dataset. Do you also plan on adding a feature where users can directly annotate their data in trashai and then upload the annotation together with the images to trashai. I think this could be a powerful tool of advancing the taco dataset, since it would target the erroneous classifications of the model directly.
- Thanks for the kind words. We do plan on adding in that feature and are actively working on it. Glad to hear you also think it will be useful.
Future of the software
This is just more out of curiosity for your future plans. I find your software a super helpful tool for annotation. However, I was wondering if you plan to bring the data of different sources together. As you state in your paper such information is often used for policy decisions and I think the approach is really useful when data from different places come together. So going in the direction of having a database for trash researchers where data can be easily uploaded from trashai and policy makers can search for locations to get information in this area could be extremely helpful. Are you already thinking in this direction?
- Love this vision you are sharing. We don't currently have a central repository for trash images and historically image data hasn't been shared very much in our field even though they have been used in many studies. I will think on this some more to consider how best to integrate the tools. We have a portal we have been developing for sharing images and other data in a standardized format of microplastics wincowger.shinyapps.io/validator. It could be used for trash images and data as well but we haven't extended the schema for it yet. Will think about this for the long term roadmap of the tool for sure.
Yes, bringing all of this data from different sources together is probably a hard endeavor. While I think in the long run it will be extremely helpful to have such a database I think it's a lot of work of harmonising different metadata schemes and building the apis. I don't know if there has anything happened yet in this direction in your research community (like metadata standards or so)
@domna Thanks for the feedback here. I created new issues for both of your first two points, love the ideas and its a lot more clear to me now what you were thinking. I need to circle back with the developers on this to see how challenging it will be to integrate these but will definitely give them a shot. https://github.com/code4sac/trash-ai/issues/127 https://github.com/code4sac/trash-ai/issues/126
Yes, bringing all of this data from different sources together is probably a hard endeavor. While I think in the long run it will be extremely helpful to have such a database I think it's a lot of work of harmonising different metadata schemes and building the apis. I don't know if there has anything happened yet in this direction in your research community (like metadata standards or so)
Totally agree its a big messy process to harmonize schemas. There some standards for reporting in the field but nothing like a JSON Schema to my knowledge. Everything is expected to live in local databases in the field today or in spreadsheet-like files. We will build from those advancements to inform our work here.
Hey @luxaritas and @domna, Hope you are both doing well. Just checking in that you are both done with your first round of reviews. If so, I will begin making revisions on the repo.
Hi @wincowgerDEV,
yes, thank you. Please go ahead and start your revisions. Feel free to also refer to me in the issues and I'll follow the updates. Looking forward to your changes!
Hey @luxaritas and @domna, Hope you are both doing well. Just checking in that you are both done with your first round of reviews. If so, I will begin making revisions on the repo.
:+1: I think it makes sense to start incorporating reviewer feedback at this stage @wincowgerDEV.
Awesome! Thank you 🙂
On Fri, Apr 14, 2023, 12:12 AM Arfon Smith @.***> wrote:
Hey @luxaritas https://github.com/luxaritas and @domna https://github.com/domna, Hope you are both doing well. Just checking in that you are both done with your first round of reviews. If so, I will begin making revisions on the repo.
👍 I think it makes sense to start incorporating reviewer feedback at this stage @wincowgerDEV https://github.com/wincowgerDEV.
— Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/5136#issuecomment-1508037213, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMUJU37IB5LIVVWSZKBI4DXBD2FHANCNFSM6AAAAAAUVXU22I . You are receiving this because you were mentioned.Message ID: @.***>
Yep, same here, I've shared all the thoughts I have at this point
@wincowgerDEV – do you have any updates on how you're getting on incorporating the feedback from reviewers?
Thanks for checking in. We have been tracking the progress through issues on our repo labeled reviewer comments. We are getting close to incorporating all the comments that we can, there is just two left that are focused on the data output format and documentation. I think we need one more month to complete all the updates.
On Sun, May 28, 2023, 2:51 AM Arfon Smith @.***> wrote:
@wincowgerDEV https://github.com/wincowgerDEV – do you have any updates on how you're getting on incorporating the feedback from reviewers?
— Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/5136#issuecomment-1566052499, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMUJUYW6IAXDCTZZSYWPRTXIMNYZANCNFSM6AAAAAAUVXU22I . You are receiving this because you were mentioned.Message ID: @.***>
Got it. Thanks for the update!
Hey @luxaritas, I want to make sure that we can meet your expectations for automated tests:
There doesn't seem to be any automated tests. While the desired behavior here is relatively clear/trivial, there's no formal instructions to verify behavior.
We are thinking about doing the following: 1) Behavior of the app stability is currently being verified by github actions which would throw an error automatically if something were wrong with the deployment. 2) We will write out instructions for using json schemas to validate the data output format and provide a tutorial on feeding a test example to the tool and receiving an expected output that the user can test against.
Question: Do you think these tests are enough or would you like to see additional specific automated tests?
The JOSS requirement is:
Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
The way I personally interpret that is: preferably with an automated script, but worst case with specific instructions for testing "by hand", you should be able to ensure the key portions of the app's functionality behave as expected. Your current GitHub actions ensures that the deployment is successful, but as far as I can tell does not validate anything about the actual functionality of the application. Things that would be good to verify could include:
When I went to look at the application right now, it actually looked like it wasn't processing any data when running the samples - with formal tests, it should ensure you catch that before changes are launched!
If you wanted to automate that, you might consider spinning up a local environment in GitHub actions and then using something like Cypress/Selenium/etc to interact with the page and verify it behaves as you expect, though again per the JOSS requirements the minimum requirement is well-defined test criteria that can be verified by hand. How you perform the tests and how many specific test cases you have and their granularity (just end-to-end tests? unit tests? both?) is ultimately up to you, but the idea is for you to have higher confidence in making changes, catch issues quicker, and make it easier for new contributors to feel comfortable making changes - automation helps amplify those benefits because you can just run your validation with a single command, and even do such "smoke checks" on every commit/PR/etc without having to manually perform those checks.
Thanks for this follow-up. We greatly appreciate this detailed feedback. It is extremely helpful for us moving forward to think about how best to automate the testing of our app. I will take these recommendations to the web guru and see what he thinks he can implement before we resubmit this publication. Agreed that having a separate service that tests the client-side experience would be ideal for ensuring performance but I also know these can be exceptionally challenging to implement and maintain so I don't want to overburden the devs, we are all volunteers.
We are really close to finishing up the review. We found someone who can add in the automated testing for dockers so we are doing that before we resubmit. I think they are getting close to it.
Wanted to give another update here. We got stalled out doing the automated tests because it was too challenging for our team to implement but have implemented quite a few manual tests and the docker compiler is handling the vast majority of the existential tests. All updates have been incorporated into the live repo https://github.com/code4sac/trash-ai. We believe that we have satisfactorily addressed all JOSS requirements and did our best to implement the additional ideas for improvement proposed by the reviewers.
You can see the issues which we resolved specifically pertaining to the reviewer's comments here: https://github.com/code4sac/trash-ai/issues?q=label%3A%22reviewer+comment%22+is%3Aclosed
The majority of the reviewers comments were resolved in this pull request: https://github.com/code4sac/trash-ai/pull/181
But also in many others that you will see referenced in the issues.
What is the best way to proceed with the review from here?
Thanks for the update @wincowgerDEV. As I think has been discussed earlier in this thread, automated tests are desirable but not essential to pass a JOSS review, provided there are other mechanisms (e.g., testing procedures that can be followed to verify the behaviour of the software.
@domna, @luxaritas – based on @wincowgerDEV's update here, if you could take one final pass on your checkboxes above.
What is the best way to proceed with the review from here?
@wincowgerDEV - based on what feels like a converging conversation, if the reviewers are able to verify your responses and updates to the package I think we're close to being done here 😄
Hey @domna and @luxaritas, I am thankful for all the effort you have already put in and recognize that it took me a while to go through the updates so it's challenging to get started again. Below I go through each of the check boxes you have open and provide my rationale for why it should be checked:
@editorialbot check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.5281/zenodo.4154370 is OK
- 10.1186/s43591-022-00035-1 is OK
- 10.1029/2019EA000960 is OK
- 10.1186/s40965-018-0050-y is OK
- 10.1016/j.wasman.2021.12.001 is OK
- 10.48550/ARXIV.2003.06975 is OK
MISSING DOIs
- None
INVALID DOIs
- None
@wincowgerDEV Thanks for the updates. For the remaining things on my end:
I'm hoping both of those things should be very quick to write up, maybe half a page or a page of text each.
@wincowgerDEV Thanks for the updates. For the remaining things on my end:
- The references check looks to be good now
- For functionality documentation, would it be easy to add a basic textual guide (with screenshots as appropriate) either to the readme or a separate markdown file in the docs directory with a link from the readme? Doesn't need to be exhaustive, but a video can be harder to use as an at-a-glance overview or quick reference
- Similarly for automation, could you add maybe a checklist in a file in the docs directory that describes the testing procedure? The demo video is great, but doesn't provide a clear list of steps of things that should be checked to know the software is behaving appropriately.
I'm hoping both of those things should be very quick to write up, maybe half a page or a page of text each.
Thanks for the feedback and second look. I think these things are not too hard to do but will still take some additional time on my part so I will need a few weeks. Made an issue here: https://github.com/code4sac/trash-ai/issues/190
@luxaritas, I completed your additional requests liked in the issue above https://github.com/code4sac/trash-ai/issues/190. Let me know if there is anything else I can do.
@domna, have you had a chance to re-review the repo or have anything else I can do? As a reminder please see this for my rationale for acceptance: https://github.com/openjournals/joss-reviews/issues/5136#issuecomment-1692542881
Hi @wincowgerDEV,
sorry for the delayed answer and thank you for the ping. I just got back from my holidays and I will re-review in the next days.
I agree with @luxaritas point, that it would be nice to have a top-level README in the docs folder which is automatically displayed when navigating there and gives a short overview.
Despite this everything looks good to me and it's good to go from my side.
I agree with @luxaritas point, that it would be nice to have a top-level README in the docs folder which is automatically displayed when navigating there and gives a short overview.
Despite this everything looks good to me and it's good to go from my side.
Thanks @domna for your positive feedback and additional comment. I just created the top level docs as per your suggestion: https://github.com/code4sac/trash-ai/commit/b46835c2a83609de0e690ddc3e854803cce7522e
@luxaritas, is there anything additional you would like to see or are you good with checking the final boxes per my updates?
Warm Regards, Win
LGTM - thank you!
@arfon, Both reviewers have graciously checked all the boxes for the review. I have thoroughly enjoyed improving this work and am very thankful for all the effort. Please let me know when I should begin the final publication process.
@wincowgerDEV – looks like we're very close to being done here. I will circle back here next week, but in the meantime, please give your own paper a final read to check for any potential typos etc.
After that, could you make a new release of this software that includes the changes that have resulted from this review. Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:
Thanks @arfon, responses below:
@wincowgerDEV – looks like we're very close to being done here. I will circle back here next week, but in the meantime, please give your own paper a final read to check for any potential typos etc.
Went through one last time and all seems correct.
After that, could you make a new release of this software that includes the changes that have resulted from this review. Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:
- The title of the archive is the same as the JOSS paper title
- That the authors of the archive are the same as the JOSS paper authors
- I can then move forward with accepting the submission.
I made this zenodo repository with the most up to date release, titled it the same as the manuscript and added the same authors in order: https://zenodo.org/record/8384126
@editorialbot set 10.5281/zenodo.8384126 as archive
Done! archive is now 10.5281/zenodo.8384126
@editorialbot recommend-accept
Attempting dry run of processing paper acceptance...
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.5281/zenodo.4154370 is OK
- 10.1186/s43591-022-00035-1 is OK
- 10.1029/2019EA000960 is OK
- 10.1186/s40965-018-0050-y is OK
- 10.1016/j.wasman.2021.12.001 is OK
- 10.48550/ARXIV.2003.06975 is OK
MISSING DOIs
- None
INVALID DOIs
- None
Submitting author: !--author-handle-->@wincowgerDEV<!--end-author-handle-- (Win Cowger) Repository: https://github.com/code4sac/trash-ai Branch with paper.md (empty if default branch): Version: 1.0 Editor: !--editor-->@arfon<!--end-editor-- Reviewers: @domna, @luxaritas Archive: 10.5281/zenodo.8384126
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@domna & @luxaritas, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @arfon know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Checklists
📝 Checklist for @domna
📝 Checklist for @luxaritas