m2lines / data-gallery

https://m2lines.github.io/data-gallery/
Apache License 2.0
0 stars 2 forks source link

Unet viz #42

Closed suryadheeshjith closed 7 months ago

suryadheeshjith commented 8 months ago
review-notebook-app[bot] commented 8 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

suryadheeshjith commented 8 months ago

I have asked @adam-subel to take a look at the notebook so he will let me know when he does. @jbusecke, currently this notebook uses large enough data that cannot be stored on this repository without using git lfs so I have opted to put it on Google Drive for now and download the data while running the notebook (See https://app.reviewnb.com/m2lines/data-gallery/pull/42/). Maybe, it would be better to put it on another repository and pull the data from there?

jbusecke commented 8 months ago

Hmm I would like to avoid storing things in different locations, since it will make maintenance a lot harder. It seems like the animations are the offenders here when it comes to size? Can we use a more efficient compression algo when creating those? I feel these short/small animations do not need to be ~20MB large?

suryadheeshjith commented 8 months ago

The animation generation and size isn't really an issue here to be honest. I take care of that by skipping 10 states each time I need to update the frame in the animation. It is the data that is required i.e. ground truth (~800MB) and predictions (~1.6GB) that are large.

LaureZanna commented 7 months ago

@suryadheeshjith + @jbusecke : what's the status of this. seems like there are some conflicts (and still some questions about the data hosting).

jbusecke commented 7 months ago

Ah got it I misunderstood the original issue then. Are you generating the notebooks (which need access to data) locally or as part of a gh action?

suryadheeshjith commented 7 months ago

If the point of these notebooks is just for presentation, we do not need a data store since I can just run them locally and disable execution on jupyterbook. But, all the other notebooks can be run by someone else either locally, on binder or on leaphub. I think it would be useful to keep it consistent.

IamShubhamGupto commented 7 months ago

If the point of these notebooks is just for presentation, we do not need a data store since I can just run them locally and disable execution on jupyterbook. But, all the other notebooks can be run by someone else either locally, on binder or on leaphub. I think it would be useful to keep it consistent.

I agree with suryas point on consistency. The current setup is it compiles on the GitHub action machine and if its successful, it is published.

We don't compile locally as pushing the compiled html files is heavy again on the user side

IamShubhamGupto commented 7 months ago

also @suryadheeshjith yesterday I renamed the folder according to https://github.com/leap-stc/leap-stc.github.io/tree/main naming conventions - it magically fixed previews 🤔

When you pull it will complain a lot but its just basic renaming

suryadheeshjith commented 7 months ago

@jbusecke Shall we move forward with what we have right now?

suryadheeshjith commented 7 months ago

@jbusecke Based on our discussion today, I will move the data to leaphub and disable its execution on jupyter book for now (just like this notebook) by adding it to this list.

We can figure out using the leaphub service account for execution of these notebooks later.

jbusecke commented 7 months ago

We can figure out using the leaphub service account for execution of these notebooks later.

Sounds good to me. Can you open a targeted issue for this though? Just so we do not loose track of it later. Thanks

suryadheeshjith commented 7 months ago

@jbusecke I have uploaded the data files on the storage bucket leap-persistent under the username 'data-gallery'. I hope this is okay.

jbusecke commented 7 months ago

@jbusecke I have uploaded the data files on the storage bucket leap-persistent under the username 'data-gallery'. I hope this is okay.

Would you mind moving it to m2lines-data-gallery? Just to avoid any confusion between projects. And can you document this somewhere centrally as the one location to store data under?

suryadheeshjith commented 7 months ago

Is there a way to move files across folders on leap? I can't find any resource for that on the technical documentation. Reuploading is kind of a pain. Solved using fs.mv('leap-persistent/data-gallery', 'leap-persistent/m2lines-data-gallery', recursive=True)

suryadheeshjith commented 7 months ago

@jbusecke It would be preferable to have the data to be read-only. Clearly, I can delete/move data files that need not be mine on leaphub.

IamShubhamGupto commented 7 months ago

Hey @suryadheeshjith I need this data mainly the plot folder available for our data gallery stuff - https://zenodo.org/records/8293998 for #41

maybe add a doc showing how to upload to the buckets, thanks

suryadheeshjith commented 7 months ago

@IamShubhamGupto I have added a note in the README. I will anyway close this PR and start a fresh one so we don't need to bother with merge conflicts. @jbusecke regarding my previous comment, we can get back to it later. I will open an issue for it.