interpreting-rl-behavior / interpreting-rl-behavior.github.io

Code for the site https://interpreting-rl-behavior.github.io/
Creative Commons Attribution 4.0 International
0 stars 0 forks source link

Concatenate all images from a sequence to one file #35

Closed danbraunai closed 2 years ago

danbraunai commented 2 years ago

Currently, data_loader.py creates one image per timestep in a sequence, and the front end loads all of those images together. It would be better to have a more lightweight setup and have one image per sequence, or even one image for several sequences.

Nix or I are happy to do this on both backend and frontend.

danbraunai commented 2 years ago

Hopefully this can allow us to load more samples than the panel.html page can currently handle (Lee claims ~300 samples).

danbraunai commented 2 years ago

@leesharkey My npm run dev also failed to handle more than 200 samples. If, like me, you get the error Error: ENOSPC: System limit for number of file watchers reached, then you can increase the max number of file watchers (with apparently no obviously bad repercussions) with: sudo sh -c "echo fs.inotify.max_user_watches=524288 >> /etc/sysctl.conf" If, after this, you end up with the error FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory, then you should install the latest version of node. I used the nvm package to do this, which is the recommended method (JS is absolute chaos): curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash nvm ls-remote nvm install [latest stable version from above]

Following this, I'm able to get npm run dev working with 400 samples, which is the max number of samples with all of the data needed to run import_data.py in the for_daniel dataset you sent me.

If this still fails for you, or fails when we try with >400 samples, then I can maybe go back to concatenating the images in each sample or even across samples. This doesn't reduce the image file size, but seems to play nicer with node during compilation when I tested it (one reason is that it reduces the number of files it needs to "watch" for hot-reloading, but it also seems to reduce overall memory consumption in the node process, potentially for other reasons). I haven't written the code to actually read and step through the concatenated images in the panel though, so will only do this if we need it.

If this works for you for the number of samples you think we'll want, feel free to close this issue.

leesharkey commented 2 years ago

Oh awesome! I suspect it works for over 400 samples then! This is great and will make interp a lot better.

Closing this, and will reopen if I run into issues.

leesharkey commented 2 years ago

So your solution got me to 500 samples. But when I tried 2000 samples, I ran into this issue:

lee@zenith:~/Documents/AI_ML_neur_projects/aisc_project/Brewing1.github.io$ npm run dev

> dev
> cross-env NODE_ENV=development webpack serve --hot

ℹ 「wds」: Project is running at http://localhost:8080/
ℹ 「wds」: webpack output is served from /
ℹ 「wds」: Content not from webpack is served from /home/lee/Documents/AI_ML_neur_projects/aisc_project/Brewing1.github.io/docs
ℹ 「wds」: 404s will fallback to /index.html
Browserslist: caniuse-lite is outdated. Please run:
  npx browserslist@latest --update-db
  Why you should do it regularly: https://github.com/browserslist/browserslist#browsers-data-updating

<--- Last few GCs --->

[33541:0x537d960]    38525 ms: Scavenge 4036.5 (4121.9) -> 4030.8 (4125.7) MB, 11.1 / 0.0 ms  (average mu = 0.805, current mu = 0.749) allocation failure 
[33541:0x537d960]    38551 ms: Scavenge 4040.4 (4125.9) -> 4036.0 (4133.2) MB, 11.6 / 0.0 ms  (average mu = 0.805, current mu = 0.749) allocation failure 
[33541:0x537d960]    39397 ms: Mark-sweep 4048.5 (4134.2) -> 4042.4 (4144.7) MB, 831.1 / 0.0 ms  (average mu = 0.656, current mu = 0.199) allocation failure scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb09980 node::Abort() [webpack]
 2: 0xa1c235 node::FatalError(char const*, char const*) [webpack]
 3: 0xcf784e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [webpack]
 4: 0xcf7bc7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [webpack]
 5: 0xeaf465  [webpack]
 6: 0xebf12d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [webpack]
 7: 0xec1e2e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [webpack]
 8: 0xe8336a v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [webpack]
 9: 0x11fc0b6 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [webpack]
10: 0x15f0b19  [webpack]

I found this suggestion online. So I ran

export NODE_OPTIONS="--max-old-space-size=524288"

This got me a new error:

lee@zenith:~/Documents/AI_ML_neur_projects/aisc_project/Brewing1.github.io$ npm run dev

> dev
> cross-env NODE_ENV=development webpack serve --hot

ℹ 「wds」: Project is running at http://localhost:8080/
ℹ 「wds」: webpack output is served from /
ℹ 「wds」: Content not from webpack is served from /home/lee/Documents/AI_ML_neur_projects/aisc_project/Brewing1.github.io/docs
ℹ 「wds」: 404s will fallback to /index.html
Browserslist: caniuse-lite is outdated. Please run:
  npx browserslist@latest --update-db
  Why you should do it regularly: https://github.com/browserslist/browserslist#browsers-data-updating
/home/lee/Documents/AI_ML_neur_projects/aisc_project/Brewing1.github.io/node_modules/copy-webpack-plugin/dist/index.js:568
            assetMap.get(priority).push(...assets);
                                   ^

RangeError: Maximum call stack size exceeded
    at /home/lee/Documents/AI_ML_neur_projects/aisc_project/Brewing1.github.io/node_modules/copy-webpack-plugin/dist/index.js:568:36

When I google the error message, it suggests this might be because we're having to call a function too many times. But I'm in no way confident of that assessment.

danbraunai commented 2 years ago

Yep, your first error can be fixed by export NODE_OPTIONS="--max-old-space-size=8192". Note that the number in the command is in MB, so 8192 is 8GB (the number you put is 524GB which I guess defaults to the max of your machine). Also note that you might want to put this in ~/.bashrc so it doesn't reset

The second error is seeming to come from this CopyWebpackPlugin when it tries to copy all the files in "static" to the javascript build directory. In fairness to the limits of this plugin, 1000 samples produces 210,000 files! I haven't found a way for CopyWebpackPlugin to handle copying more files. It is also a little gross in general having so many files, it might lead to other errors.

We can reduce the number of files by a factor of num_timesteps by concatenating horizontally all the images. I've wrote code a couple of weeks ago in import_data.py for this (it is commented out on line 199), but hadn't implemented anything on the frontend to handle these concatenated images. I did test loading 1000 samples to the frontend with the images concatenated, and there were no errors, so that's a good sign. If we want to reduce it by another factor of 20, we could also concatenate vertically the images for each type (e.g. obs, sal_hx_1, sal_hx_4, ...). This would be a little more messy to handle on the frontend.

These are my current picks for things to do:

I can try implement one/several of these tomorrow, let me know what you think.

danbraunai commented 2 years ago

I implemented the first option above, and it lets me load the 1000 samples no problem. Lmk if you have trouble loading lots of samples, otherwise, you can close this issue.

leesharkey commented 2 years ago

It works with 4000 samples. That's epic.

Thanks for this!