neuropoly / intranet.neuro.polymtl.ca

NeuroPoly's lab manual
https://intranet.neuro.polymtl.ca
4 stars 6 forks source link

Import datasets overview #85

Closed kousu closed 1 year ago

kousu commented 1 year ago

There is a dataset overview created by @naga-karthik and @valosekj here on Google.

I believe this would fit much better somewhere under data/. So, port it in.

kousu commented 1 year ago

Also, I assume this overview was motivated by the CLI interface to our private datasets being too abstract

``` $ ssh git@data PTY allocation request failed hello user this is git@data running gitolite3 3.6.12-1 (Debian) on git 2.34.1 R C datasets/..* R W datasets/basel-mp2rage R W datasets/bavaria-quebec-spine-ms R W datasets/beijing-tumor R W datasets/canproco R W datasets/data-single-subject_DO-NOT-USE R W datasets/data_axondeepseg_bf_source R W datasets/data_axondeepseg_bf_training R W datasets/data_axondeepseg_tem R W datasets/data_axondeepseg_users R W datasets/data_axondeepseg_vcu R W datasets/data_axondeepseg_wakehealth_source R W datasets/data_axondeepseg_wakehealth_training R W datasets/eeg-epilepsy R W datasets/levin-stroke R W datasets/lumbar-epfl R W datasets/mni-bmpd R W datasets/model_seg_exvivo_gm-wm_t2_unet2d-multichannel-softseg R W datasets/msseg_challenge_2016 R W datasets/msseg_challenge_2021 R W datasets/philadelphia-pediatric R W datasets/sci-colorado R W datasets/sci-zurich R W datasets/sct-testing-large R W datasets/spine-generic-processed R W datasets/template_dog_virginiatech R W datasets/uk-biobank R W datasets/uk-biobank-processed R W datasets/umass-ms-ge-excite1.5 R W datasets/umass-ms-ge-hdxt1.5 R W datasets/umass-ms-ge-pioneer3 R W datasets/umass-ms-siemens-espree1.5 R W datasets/uqueensland_mouse Please see https://github.com/neuropoly/data-management/blob/master/internal-server.md for more help ```

The names of the dataset aren't enough of a guide.

But, I wonder, since I am planning (https://github.com/neuropoly/data-management/issues/77 / https://github.com/neuropoly/computers/issues/167) to replace the CLI interface with a more GitHub-like one, I wonder if maybe this will rapidly become superfluous, when people can just click to https://data.neuro.polymtl.ca/explore the way the public currently can with our one big open-access dataset at https://github.com/spine-generic/.

valosekj commented 1 year ago

There is a dataset overview created by @naga-karthik and @valosekj here on Google.

I believe this would fit much better somewhere under data/. So, port it in.

Maybe, also, share policy should be considered. The Google Sheets table is currently private (shared only within NeuroPoly). NeuroPoly intranet is publicly available, though.

Tagging @jcohenadad for his opinion.

kousu commented 1 year ago

The Google Sheets table is currently private (shared only within NeuroPoly

Is it? I don't think it is.

While logged out:

Screenshot_20221214_154228

and while logged in I can see that linked sharing is turned on

Screenshot_20221214_154300

that's why I thought it was okay to port it in :thinking:

kousu commented 1 year ago

When I get data.neuro.polymtl.ca upgraded to Gitea, everything there will stay private but we could sidestep the issue by adding, say, https://data.neuro.polymtl.ca/datasets/awesome-overview/wiki, or make it a policy that people need to hash-tag their datasets in the provided Description field so that scanning down https://data.neuro.polymtl.ca/explore gives a good overview of what's available,

like this. ![Screenshot 2022-12-14 at 16-00-54 Neurogitea](https://user-images.githubusercontent.com/987487/207712866-f8398183-e6f1-49ec-a597-92202ace430b.png) You can even filter by tag with the search button! ![Screenshot 2022-12-14 at 16-01-47 Neurogitea](https://user-images.githubusercontent.com/987487/207713062-94bb82f4-326e-4bcd-ae60-58a2231db717.png)
jcohenadad commented 1 year ago

Is it? I don't think it is.

Indeed, it was not. Thank you for catching this Nick. I would prefer it if this list would stay private. I just changed it.

As for the replacement of this table with the future Gitea, this is an excellent idea indeed. And indeed, having a (private) overview of the available datasets will be very useful for students. If we could display the different fields similarly to the current google sheet (instead of having the aggregated tags mixed between categories, eg: #sci #T1w #brain #EPFL) that would be very useful.

kousu commented 1 year ago

Great. I'll work hard to get Gitea going then.

Closing since this isn't meant to be public.