pipilurj / BONAS

35 stars 9 forks source link

EA-Sampler #2

Open ascheppach opened 3 years ago

ascheppach commented 3 years ago

Thanks a lot for your work and your research-contribution to neural architecture search with bayesian optimization. I am implementing BONAS for genomic sequence data and was wondering, wether the EA-sampler is the final version? So far, I always receive an error for line 39 in folder opendomain_utils file mutate.py. That is because, if 'node_to_disconnect=4', than 'range(node_to_disconnect+1, 7)' will be range(5,7) and if no value is new_adj[node_to_disconnect][i] == 1, then it produces an empty sequence, where we can not sample from. Look forward to your reply!

pipilurj commented 3 years ago

Yes, it is the final version. Unfortunately, the mutation of EA sampler was hardcoded specifically for the darts search space, so I'm not quite sure if it fits for your scenario. In fact, I would recommend using random sampler, since there is no significant difference between the random sampler and EA sampler in terms of the sample efficiency. If you can randomly generate many architectures, random sampler may work better, since BONAS mainly relies on the GCN and bayesian regressor for selecting candidates.

ascheppach commented 3 years ago

Thanks for your fast reply! Currently I am using your original code, so it is not due to new scenario. As I said, line 'possible_disconnects = [i for i in range(node_to_disconnect+1, 7) if new_adj[node_to_disconnect][i] == 1]' in mutate.py will often times produce an empty sequence, when 'new_adj[node_to_disconnect][i] == 0]' for all i's. Are you sure random sampling performs equally to EA sampling? Because in you paper in Figure 5, BONAS_random does not perform as goog ad BONAS.

pipilurj commented 3 years ago

Yes, note that for NASBench101, the search space is limited to 42w+ architectures, so for fair comparison, we limit the sample size of random sample to a small number (0.01% of total search space). However, for open domain search, we can sample as many candidates as we want (since predicting a candidate using GCN+Bayes regressor requires very little time). Thank you very much for pointing out the bug, you can try the following: if the sequence is empty, skip the arch and continue sampling.

ascheppach commented 3 years ago

Yes, I get your point. As the prediction of a candidate is computationally cheap, I can just use random sampling and increase the number of samples per iteration? I have another question, regarding the file dynamic_generate.py in folder data_generators. The created adjacency matrix has shape [11,11] and each node of dag has 2 columns and to 2 rows right? As we have 4 nodes we end up having 8 columns for these 4 nodes. But why don't we just use 4 columns (so 1 column per node) and each column would then receive 2 inputs/1's ? Thank you in advance for your reply.

pipilurj commented 3 years ago

The idea is as follows: the "nodes" in the dag have two inputs, and each input is associated with an independent operation. Thus, to form the input for GCN, we treat each input as a node, rather than the actual node, since each input needs to have an independent feature (which is an operation in our case). Hopefully I explained it right.

ascheppach commented 3 years ago

Thank you, I understand your explanation. But why do you assign then only one new connection in 'new_adj[node_to_connect, to_connect] = 1' in line 40 of mutate.py (folder opendomain_utils)? Because in data_generators/dynamic_generate.py you assign 2 inputs, because here you have 'mat[a, 4] = 1', with a being [2,3] (2 values), while node_to_connect is always only 1 value