leelabcnbc / thesis-yimeng-v2

good parts of thesis-yimeng-v1, better refactoring.
1 stars 0 forks source link

thesis-yimeng-v2

This file is a revision of the original doc.

The current version only contains doc related to reproducing results in the chapter 4 of thesis.

$ROOT refers to repo root.

for CMUers

Everything can be found in the following places in the mind cluster.

data

raw 8K data can be found in yuanyuan_8k_neural.hdf5 and yuanyuan_8k_images.hdf5 under /user_data/yimengzh/thesis-yimeng-v2/results/datasets/raw. These two files contain recordings of six days. We used data from three days. These files are generated from source MATLAB files which are ultimately generated from raw recording data. This repo contains scripts to convert MATLAB files into the above HDF5 formats, as described below; the raw MATLAB files were generated by some spike sorting plus format conversion, and Summer/Yuanyuan should have more knowledge about the generation process.

raw NS 2250 data can be found in /user_data/yimengzh/gaya-data/data/tang/batch/final/tang_neural.npy and /user_data/yimengzh/gaya-data/data/tang/images/all_imags.npy. Hal has more knowledge about the generation process from raw recording data into the above NumPy files.

dependencies

reproduce results

The steps should work on the CNBC cluster (mind) and will work with single machine with some small adaptations.

All the actual computation is done inside the Singularity container.

preprocess neural data

ImageNet 8K

  1. first, you need to download ImageNet 8K data. Run the command OUTSIDE the container.
    $ROOT/setup_private_data.sh
  2. run the following inside the container
    python $ROOT/scripts/preprocessing/raw_data.py
    python $ROOT/scripts/preprocessing/prepared_data.py

NS 2250

Ask Hal about it. This code repo uses Hal's code under the hood to obtain the data.

model training

All commands should run outside the container, with a basic Python 3.6+ environment without any additional dependency needed. On the CNBC cluster, such an environment can be established using scl enable rh-python36 bash.

main models (recurent and feed-forward, no ablation)

ImageNet 8K

Run the following files under $ROOT/scripts/training/yuanyuan_8k_a_3day/maskcnn_polished_with_rcnn_k_bl. These files in total may train some extra models. But these form the minimal set of files required to cover all models used in the paper.

NS 2250

Run the following files under $ROOT/scripts/training/gaya/maskcnn_polished_with_rcnn_k_bl. These files in total may train some extra models. But these form the minimal set of files required to cover all models used in the paper.

multi path models that correspond to recurrent models

Only 8/16/32 ch models were considered; higher ch will result in a higher frequency of OOM, making the results not very useful.

ImageNet 8K

Run the following files under $ROOT/scripts/training/yuanyuan_8k_a_3day/maskcnn_polished_with_rcnn_k_bl. These files in total may train some extra models. But these form the minimal set of files required to cover all models used in the paper.

NS 2250

Run the following files under $ROOT/scripts/training/gaya/maskcnn_polished_with_rcnn_k_bl. These files in total may train some extra models. But these form the minimal set of files required to cover all models used in the paper.

ablated multi path models

Only 16/32 ch, 2 L models trained using all data were considered, as these models had lowest memory requirement and matched recurrent models the best.

ImageNet 8K

Run the following files under $ROOT/scripts/training/yuanyuan_8k_a_3day/maskcnn_polished_with_rcnn_k_bl. These files in total may train some extra models. But these form the minimal set of files required to cover all models used in the paper.

NS 2250

Run the following files under $ROOT/scripts/training/gaya/maskcnn_polished_with_rcnn_k_bl. These files in total may train some extra models. But these form the minimal set of files required to cover all models used in the paper.

plots

check files in results_thesis