mne-tools / mne-bids-pipeline

Automatically process entire electrophysiological datasets using MNE-Python.
https://mne.tools/mne-bids-pipeline/
BSD 3-Clause "New" or "Revised" License
138 stars 65 forks source link

BUG: Website too large to deploy #752

Closed hoechenberger closed 6 months ago

hoechenberger commented 1 year ago

It's time to cut a new release, this is the auto-generated changelog from GH:

What's Changed

New Contributors

Full Changelog: https://github.com/mne-tools/mne-bids-pipeline/compare/v1.3.0...v1.4.0

I renamed a couple of PRs because they had rather non-descriptive titles. Please let's ensure that before merging, PR titles look like something that we'd actually want to appear in the changelog.

@larsoner I can make the release tomorrow, just wanted to get your okay to move forward.

larsoner commented 1 year ago

Works for me! There's a lot more to add for movement compensation but this will at least get things started

agramfort commented 1 year ago

+1

Message ID: @.***>

hoechenberger commented 1 year ago

I tagged the release but the doc deployment is failing

And we have issues in the docs themselves. Maybe we should change the settings to fail hard on warnings:

WARNING - Documentation file 'changes.md' contains a link to '[mne_bids_pipeline._config.ssp_ecg_channel' which is not found in the documentation files. WARNING - Documentation file 'changes.md' contains a link to '[mne_bids_pipeline._config.read_raw_bids_verbose' which is not found in the documentation files. WARNING - Documentation file 'changes.md' contains a link to '[mne_bids_pipeline._config.mf_reference_run' which is not found in the documentation files. WARNING - Documentation file 'changes.md' contains a link to 'mne_bids_pipeline._config.n_jobs' which is not found in the documentation files. WARNING - Documentation file 'settings/preprocessing/maxfilter.md' contains a link to 'settings/preprocessing/mne_bids_pipeline._config.mf_destination' which is not found in the documentation files.

larsoner commented 1 year ago

I tried restarting the deployment but it failed. Looks like this is the issue:

remote: fatal: pack exceeds maximum allowed size (2.00 GiB)        

So we probably need to find a way to make our website smaller

hoechenberger commented 1 year ago

We could include only the reports for the first participant and grand average for each example

larsoner commented 1 year ago

I'm not sure that will be enough:

  1. git clean -xdf gives a docs/ size of ~400 kB
  2. /build_docs.sh results in 707 MB
  3. If I add the logic to only use the first participant and the average, git clean and build_docs.sh again, it goes down to 597 MB

We also will probably need to do something with all the previous versions of the docs. If those are all 500-700 MB then that's going to be a problem going forward as we add more and more 200-300 MB docs versions (assuming we can even get them that small!).

EDIT: Code to trim:

Code in gen_examples.py to trim ``` ... if not html_report_fnames: datasets_without_html.append(dataset_name) continue # Cull to first participant and grand average n = len(html_report_fnames) stems = [f.stem for f in html_report_fnames] subjects = [s.split("_")[0] for s in stems if s.startswith("sub-")] use_subjects = {"sub-average"} use_subjects.add(sorted(set(subjects).symmetric_difference(use_subjects))[0]) html_report_fnames = [ fname for fname, subject in zip(html_report_fnames, subjects) if subject in use_subjects or True ] use_n = len(html_report_fnames) fname_iter = tqdm( html_report_fnames, desc=f" {test_name} ({use_n}/{n}))", unit="file", leave=False, ) ... ```
larsoner commented 1 year ago

I think ultimately keeping all doc versions forever will be untenable. Some options:

1. Archive and link to old doc versions

  1. Keep the last N versions available as HTML -- maybe 2, which would mean keep 1.4 (just released) and 1.3?
  2. Versions older than that get zipped, uploaded to OSF.io or similar, and we add a link to them somewhere in our docs.

Has the advantage of staying complete, but mike might not be happy with it, and it's not easy to see what config values were around in 1.0 for example.

2. Archive report HTMLs from old versions

For this, we could manually:

  1. take the report HTMLs for 1.0, 1.1, and 1.2, zip each one, and upload to OSF.io
  2. rm the HTMLs for those versions from the website
  3. Add a javascript HTML banner that says "report HTML links for this unsupported version of MNE-BIDS-Pipeline will not work, but they can be downloaded here: "
  4. and/or we modify the generated website to have all report download links point to the OSF.io zip link or something

Has the advantage of being browsable, but gives the sense that report HTMLs will be readily available when they're not.

3. Remove and strip out report HTMLs from old versions

I think this is my preference:

  1. rm the old report HTML files
  2. modify the generated HTML to have visibility: hidden on the "Generated output" heading and box (either manually do it with a search/replace, or do it in JavaScript). Or we can just kill those HTML sections altogether (though this is maybe harder to script).

Has the advantage of being browseable, but the old report HTMLs are gone. This is the easiest and I don't think users will really care what the old report HTML from unsupported MNE-BIDS-Pipelines used to look like.

@hoechenberger any other ideas? If not, okay with you if I just do (3)?

hoechenberger commented 1 year ago

@larsoner I don't think anyone cares about the example reports for old versions, so 3 is entirely fine with me

larsoner commented 1 year ago

Okay locally I did:

$ git clone --single-branch --branch gh-pages git@github.com:/mne-tools/mne-bids-pipeline-website
$ cd mne-bids-pipeline-website
$ find $PWD/1.3/examples/ -type d -name "*" | tail -n +2 | xargs rm -Rf
... repeat for other versions
$ find 1.3/examples/ -name "*.html" -exec sed -i /^\<h2\ id=\"generated-output\"\>Generated/,/^\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ $/{d} {} \;
... repeat for other versions
$ find 1.3/examples/ -name "*.html" -exec sed -i '/^  <a href="#generated-output"/,/^  <\/a>$/{d}' {} \;
... repeat for other versions
$ git commit --amend -a
$ git push -f

After things looked okay locally. Deployment looks okay I think:

before after
Screenshot from 2023-07-05 16-07-17 Screenshot from 2023-07-05 16-10-35

We can "just" use these commands every once in a while to cull the example HTML files until we find a way to systematically delete them...

hoechenberger commented 6 months ago

Agh, doc deployment failing again:

"fatal: pack exceeds maximum allowed size"

for the latest merge into main and, more importantly, for the latest tagged release

larsoner commented 6 months ago

Hmm this is one reason maybe we should be doing point releases for little bugfixes, those would be deployed to the same directories. The only quick fix I can think of is to delete 1.5 now for example

larsoner commented 6 months ago

We should probably figure out how to get this down:

larsoner:~/python/mne-bids-pipeline$ du -hs 1.5
769M    1.5
larsoner:~/python/mne-bids-pipeline$ du -hs 1.6
854M    1.6
larsoner:~/python/mne-bids-pipeline$ du -hs 1.7
854M    1.7

and

$ du -hs 1.7/examples/
849M    1.7/examples/
$ du -hs 1.7/examples/ERP_CORE
510M    1.7/examples/ERP_CORE

So a good target might be looking at the ERP_CORE

6.3M    1.7/examples/ERP_CORE/sub-019_ses-P3_task-P3_proc-ica+components_report.html
1.3M    1.7/examples/ERP_CORE/sub-019_ses-P3_task-P3_proc-icafit_report.html
1.5M    1.7/examples/ERP_CORE/sub-019_ses-P3_task-P3_proc-ica_report.html
6.5M    1.7/examples/ERP_CORE/sub-019_ses-P3_task-P3_report.html

So back of the envelope 15MB per subject per task, 1657=525MB. Options:

  1. Run all subjects but only upload one subject's reports, which seems reasonable -- that cuts this down greatly.
  2. Reduce the number of tasks -- not sure what the consequences are here esp. since we do different types of analyses on different tasks.
  3. Follow https://github.com/mne-tools/mne-bids-pipeline/issues/880#issuecomment-1991078989 and remove two of the ICA reports.

I'll try (1) and (3) in a PR and we can see how big the resulting doc build is.

hoechenberger commented 6 months ago

thanks for looking into this!

larsoner commented 6 months ago

All fixed and example-removing scripts added in #899 so we can hopefully stop removing old versions (forgot we could do that otherwise I wouldn't have removed the older ones!)