[PRE REVIEW]: Cost-Effective Big Data Orchestration Using Dagster: A Multi-Platform Approach

editorialbot commented 2 months ago

Submitting author: !--author-handle-->@HPicatto@HaoZeke<!--end-editor-- Reviewers: Pending Managing EiC: Daniel S. Katz

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/b0bfeb890f8d120de4e13dd52f9d5177"><img src="https://joss.theoj.org/papers/b0bfeb890f8d120de4e13dd52f9d5177/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/b0bfeb890f8d120de4e13dd52f9d5177/status.svg)](https://joss.theoj.org/papers/b0bfeb890f8d120de4e13dd52f9d5177)

Author instructions

Thanks for submitting your paper to JOSS @HPicatto. Currently, there isn't a JOSS editor assigned to your paper.

@HPicatto if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). You can search the list of people that have already agreed to review and may be suitable for this submission.

Editor instructions

The JOSS submission bot @editorialbot is here to help you find and assign reviewers and start the main review. To find out what @editorialbot can do for you type:

@editorialbot commands

editorialbot commented 2 months ago

Hello human, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 2 months ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1016/j.future.2023.12.026 is OK
- 10.1145/2934664 is OK
- 10.1145/3472883.3486982 is OK
- 10.1145/3514221.3526054 is OK
- 10.1007/s11192-020-03726-9 is OK
- 10.5281/zenodo.7196590 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: Dagster | Cloud-native Orchestration of Data Pipel...
- No DOI given, and none found for title: Cost efficient alternative to databricks lock-in

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None

editorialbot commented 2 months ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.04 s (986.8 files/s, 128171.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          29            568            416           3557
Markdown                         3            126              0            442
TOML                             4             27              2            128
TeX                              1              1              0            103
make                             1              6             21             51
JSON                             4              0              0              7
-------------------------------------------------------------------------------
SUM:                            42            728            439           4288
-------------------------------------------------------------------------------

Commit count by author:

    24  geoHeil
     9  Georg Heiler
     8  HPicatto
     2  Hernan
     1  CI Hotfix

editorialbot commented 2 months ago

Paper file info:

📄 Wordcount for paper.md is 2025

✅ The paper includes a Statement of need section

editorialbot commented 2 months ago

License info:

✅ License found: MIT License (Valid open source OSI approved license)

editorialbot commented 2 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 2 months ago

Five most similar historical JOSS papers:

High-performance neural population dynamics modeling enabled by scalable computational infrastructure Submitting author: @a9p Handling editor: @emdupre (Active) Reviewers: @richford, @tachukao Similarity score: 0.6596

fseval: A Benchmarking Framework for Feature Selection and Feature Ranking Algorithms Submitting author: @dunnkers Handling editor: @diehlpk (Active) Reviewers: @mcasl, @estefaniatalavera Similarity score: 0.6578

EspressoDB: A scientific database for managing high-performance computing workflows Submitting author: @ckoerber Handling editor: @gkthiruvathukal (Active) Reviewers: @remram44, @ixjlyons Similarity score: 0.6475

SCAS dashboard: A tool to intuitively and interactively analyze Slurm cluster usage Submitting author: @wathom Handling editor: @danielskatz (Active) Reviewers: @aturner-epcc, @phargogh Similarity score: 0.6428

strucscan: A lightweight Python-based framework for high-throughput material simulation Submitting author: @thohamm Handling editor: @ppxasjsm (Active) Reviewers: @mturiansky, @wcwitt Similarity score: 0.6416

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

danielskatz commented 2 months ago

👋 @HPicatto - thanks for your submission.

Before we proceed, there are a few items that need updating:

For the affiliations, we don't really need addresses, just institutions and countries.
Your paper is too long. JOSS requests papers that are roughly 250-1000 words, and yours is over twice as long. See the example paper. Perhaps there is material you can remove and either move to the README or elsewhere in the repo, or in the documentation? (For example, the implementation challenges section might fit a traditional paper about the software, but it doesn't really fit a JOSS paper. And the platform comparison part might be moved to the repo.) Feel free to use the command @editorialbot check repository to run some checks, one of which calculates the word count of the paper. editorialbot commands need to be the first entry in a new comment.

Once these changes are made, please ping me and we can get the review started.

HPicatto commented 1 month ago

Hi @danielskatz, We have finished resizing the content in an internal branch, and we moved two sections into .md files. Could you guide us on how we should reference these sections within the main Markdown file so that they are properly linked in the paper? Should we use the final GitHub URL, or is there a specific way that JOSS handles appendices or external references? Thank you for your help!

danielskatz commented 1 month ago

JOSS does not consider these appendices, but they are external references, so you can just use URLs to them, saying that they are in the GitHub repo

HPicatto commented 1 month ago

Hi @danielskatz we updated main, could you please give us new feedback

danielskatz commented 1 month ago

@HPicatto - All the commands I'm now going to run to check the length of the paper, check references, and regenerate the paper are all commands you can run too. Note that editorialbot commands need to be the first entry in a new comment.

danielskatz commented 1 month ago

@editorialbot check repository

danielskatz commented 1 month ago

@editorialbot check references

danielskatz commented 1 month ago

@editorialbot generate pdf

editorialbot commented 1 month ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.04 s (992.2 files/s, 124633.3 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          31            585            427           3606
Markdown                         3            126              0            442
TOML                             4             27              1            124
TeX                              1              1              0            103
make                             1              6             21             51
JSON                             4              0              0              7
-------------------------------------------------------------------------------
SUM:                            44            745            449           4333
-------------------------------------------------------------------------------

Commit count by author:

    26  geoHeil
     9  Georg Heiler
     8  HPicatto
     2  CI Hotfix
     2  Hernan

editorialbot commented 1 month ago

Paper file info:

📄 Wordcount for paper.md is 2025

✅ The paper includes a Statement of need section

editorialbot commented 1 month ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1016/j.future.2023.12.026 is OK
- 10.1145/2934664 is OK
- 10.1145/3472883.3486982 is OK
- 10.1145/3514221.3526054 is OK
- 10.1007/s11192-020-03726-9 is OK
- 10.5281/zenodo.7196590 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: Dagster | Cloud-native Orchestration of Data Pipel...
- No DOI given, and none found for title: Cost efficient alternative to databricks lock-in

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None

editorialbot commented 1 month ago

License info:

✅ License found: MIT License (Valid open source OSI approved license)

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 1 month ago

Five most similar historical JOSS papers:

High-performance neural population dynamics modeling enabled by scalable computational infrastructure Submitting author: @a9p Handling editor: @emdupre (Active) Reviewers: @richford, @tachukao Similarity score: 0.6595

fseval: A Benchmarking Framework for Feature Selection and Feature Ranking Algorithms Submitting author: @dunnkers Handling editor: @diehlpk (Active) Reviewers: @mcasl, @estefaniatalavera Similarity score: 0.6576

EspressoDB: A scientific database for managing high-performance computing workflows Submitting author: @ckoerber Handling editor: @gkthiruvathukal (Active) Reviewers: @remram44, @ixjlyons Similarity score: 0.6474

SCAS dashboard: A tool to intuitively and interactively analyze Slurm cluster usage Submitting author: @wathom Handling editor: @danielskatz (Active) Reviewers: @aturner-epcc, @phargogh Similarity score: 0.6427

strucscan: A lightweight Python-based framework for high-throughput material simulation Submitting author: @thohamm Handling editor: @ppxasjsm (Active) Reviewers: @mturiansky, @wcwitt Similarity score: 0.6414

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

danielskatz commented 1 month ago

@HPicatto - One issue is that the paper still appears to be twice a long as JOSS recommends. As I skim it, the Implementation Challenges section seems to not really fit JOSS's model, though if this was a paper in another venue about the process of developing the software, it would make sense. That's the only section I would suggest removing, and without it, I would be ok going ahead with the review even if the paper was still a bit long.

Another minor issue is that we don't need your mailing addresses in your affiliations - just institution and country is enough, along with unit (e.g. department, division, etc.) if you want.

Thanks for the progress!

HPicatto commented 1 month ago

@editorialbot check repository

HPicatto commented 1 month ago

@editorialbot check references

HPicatto commented 1 month ago

@editorialbot generate pdf

editorialbot commented 1 month ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.04 s (1090.3 files/s, 131612.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          31            585            427           3606
Markdown                         5            133              0            463
TOML                             4             27              1            124
TeX                              1              1              0            101
make                             1              6             21             51
JSON                             4              0              0              7
-------------------------------------------------------------------------------
SUM:                            46            752            449           4352
-------------------------------------------------------------------------------

Commit count by author:

    26  geoHeil
     9  Georg Heiler
     9  HPicatto
     3  CI Hotfix
     2  Hernan

editorialbot commented 1 month ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1016/j.future.2023.12.026 is OK
- 10.1145/2934664 is OK
- 10.1145/3472883.3486982 is OK
- 10.1145/3514221.3526054 is OK
- 10.5281/zenodo.7196590 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: Dagster | Cloud-native Orchestration of Data Pipel...
- No DOI given, and none found for title: Cost efficient alternative to databricks lock-in

❌ MISSING DOIs

- 10.1007/s11192-020-03726-9 may be a valid DOI for title: Web mining for innovation ecosystem mapping: a fra...

❌ INVALID DOIs

- None

editorialbot commented 1 month ago

Paper file info:

📄 Wordcount for paper.md is 1300

✅ The paper includes a Statement of need section

editorialbot commented 1 month ago

License info:

🟡 License found: GNU General Public License v3.0 (Check here for OSI approval)

HPicatto commented 1 month ago

Hi @danielskatz new version is here, now the extension is 1300 words, do you think it's acceptable? it's not easy to keep reducing the length

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 1 month ago

Five most similar historical JOSS papers:

PyDGN: a Python Library for Flexible and Reproducible Research on Deep Learning for Graphs Submitting author: @diningphil Handling editor: @arfon (Active) Reviewers: @idoby, @sepandhaghighi Similarity score: 0.6405

strucscan: A lightweight Python-based framework for high-throughput material simulation Submitting author: @thohamm Handling editor: @ppxasjsm (Active) Reviewers: @mturiansky, @wcwitt Similarity score: 0.6393

Mantik: A Workflow Platform for the Development of Artificial Intelligence on High-Performance Computing Infrastructures Submitting author: @rico-berner Handling editor: @arfon (Active) Reviewers: @zhaozhang, @gflofst Similarity score: 0.6366

DASF: A data analytics software framework for distributed environments Submitting author: @d-eggert Handling editor: @martinfleis (Active) Reviewers: @cjwu, @pritchardn Similarity score: 0.6353

CM++ - A Meta-method for Well-Connected Community Detection Submitting author: @chackoge Handling editor: @arfon (Active) Reviewers: @LuisScoccola, @chryswoods Similarity score: 0.6330

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

danielskatz commented 1 month ago

@HPicatto - sorry I missed your note above. Yes, I think this length is fine. Can you also add the missing DOI that editorialbot found in your .bib file, then please recheck the references and regenerate the pdf? I will mark this as unpaused, but will need to wait until I can find an editor with capacity to take this on.

danielskatz commented 1 month ago

👋 @HaoZeke - it looks like one of your edited submissions is just about done - would you be able to take on this one?

danielskatz commented 1 month ago

@editorialbot invite @HaoZeke as editor

editorialbot commented 1 month ago

Invitation to edit this submission sent!

danielskatz commented 1 month ago

👋 @hugoledoux - While I finish accepting another of your edited submissions, I wonder if you would be willing to become the editor for this one?

danielskatz commented 1 month ago

@editorialbot invite @hugoledoux as editor

editorialbot commented 1 month ago

Invitation to edit this submission sent!

HaoZeke commented 1 month ago

Hi @danielskatz I will be able to edit this, with the caveat being I will not be able to start soliciting reviews until the weekend / Monday morning. If that's alright I'll go ahead and assign myself

danielskatz commented 1 month ago

Thanks @HaoZeke - that would be great!

danielskatz commented 1 month ago

@editorialbot assign @HaoZeke as editor

editorialbot commented 1 month ago

Assigned! @HaoZeke is now the editor

HaoZeke commented 1 month ago

@editorialbot remind @HaoZeke in 3 days

editorialbot commented 1 month ago

Reminder set for @HaoZeke in 3 days

editorialbot commented 4 weeks ago

:wave: @HaoZeke, please take a look at the state of the submission (this is an automated reminder).

HPicatto commented 4 weeks ago

@editorialbot check references

editorialbot commented 4 weeks ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1016/j.future.2023.12.026 is OK
- 10.1145/2934664 is OK
- 10.1145/3472883.3486982 is OK
- 10.1145/3514221.3526054 is OK
- 10.1007/s11192-020-03726-9 is OK
- 10.5281/zenodo.7196590 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: Dagster | Cloud-native Orchestration of Data Pipel...
- No DOI given, and none found for title: Cost efficient alternative to databricks lock-in

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None

HPicatto commented 3 weeks ago

Hi, is it possible to have any update on this?

HaoZeke commented 2 weeks ago

hi @d-eggert @rico-berner 👋 would you be interested in and available to review this JOSS submission? We carry out our checklist-driven reviews here in GitHub issues and follow these guidelines: joss.readthedocs.io/en/latest/review_criteria.html

If not, could you recommend any potential reviewers? I was hoping to have your insights because of your past authorship of related JOSS publications.

openjournals / joss-reviews

[PRE REVIEW]: Cost-Effective Big Data Orchestration Using Dagster: A Multi-Platform Approach #7267

Status