tskit-dev / tszip

Gzip-like compression for tskit tree sequences
https://tszip.readthedocs.io/
MIT License
4 stars 7 forks source link

AttributeError: 'generator' object has no attribute 'tables' #26

Closed agladstein closed 5 years ago

agladstein commented 5 years ago

I'm getting an error from my simulation code:

def sim_single_pulse_uni_AB(sample_size, L, u, r, locus_replicates, fixed_params=None):
    if fixed_params['u']:
        u = fixed_params['u']
    if not fixed_params['NA']:
        NA = selectVal(1000, 40000)
    else:
        NA = fixed_params['NA']
    if not fixed_params['NB']:
        NB = selectVal(1000, 40000)
    else:
        NB = fixed_params['NB']
    if not fixed_params['td']:
        td = selectVal(100, 3499.99)
    else:
        td = fixed_params['td']
    if not fixed_params['tm']:
        tm = td/10
    else:
        tm = fixed_params['tm']
    if not fixed_params['m_AB']:
        m_AB = selectVal(0.1, 0.9)
    else:
        m_AB = fixed_params['m_AB']
    if not fixed_params['seed']:
        seed = random.randint(1, 2**32-1)
    else:
        seed = fixed_params['seed']
    print('NA: {}'.format(NA))
    print('NB: {}'.format(NB))
    print('tm: {}'.format(tm))
    print('td: {}'.format(td))
    print('m_AB: {}'.format(m_AB))
    print('seed: {}'.format(seed))
    A, B = 0, 1
    population_configurations = [
        msprime.PopulationConfiguration(
            sample_size=int(sample_size/2), initial_size=NA),
        msprime.PopulationConfiguration(
            sample_size=int(sample_size/2), initial_size=NB)
    ]
    demographic_events = [
        msprime.MassMigration(
            time=tm, source=A, destination=B, proportion=m_AB),
        msprime.MassMigration(
            time=td, source=B, destination=A, proportion=1.0)
    ]
    tree = msprime.simulate(population_configurations=population_configurations,
                            demographic_events=demographic_events,
                            length=L,
                            recombination_rate=r,
                            mutation_rate=u,
                            random_seed=seed,
                            num_replicates=locus_replicates)
    tszip.compress(tree, "simulation.trees.tsz")
Traceback (most recent call last):
  File "src/data/simulate_msprime.py", line 218, in <module>
    main()
  File "src/data/simulate_msprime.py", line 206, in main
    max_snps = sim_locus_reps(model_func, sample_size, L, u, r, fixed_params, param_file_path, j, out_file_path, max_snps, locus_replicates)
  File "src/data/simulate_msprime.py", line 156, in sim_locus_reps
    tree_replicates, params, y, label = model_func(sample_size, L, u, r, locus_replicates, fixed_params)
  File "/Users/agladsteinNew/dev/cnn_classify_demography/src/data/demographic_models.py", line 336, in sim_single_pulse_uni_AB
    tszip.compress(tree, "simulation.trees.tsz")
  File "/Users/agladsteinNew/.local/share/virtualenvs/cnn_classify_demography-zZ5GtBgD/lib/python3.7/site-packages/tszip/compression.py", line 97, in compress
    compress_zarr(ts, root, variants_only=variants_only)
  File "/Users/agladsteinNew/.local/share/virtualenvs/cnn_classify_demography-zZ5GtBgD/lib/python3.7/site-packages/tszip/compression.py", line 160, in compress_zarr
    tables = ts.tables
AttributeError: 'generator' object has no attribute 'tables'
jeromekelleher commented 5 years ago

When you use the num_replicates option you get back an iterator over many tree sequences. You'll need to do something like

reps = msprime.simulate(stuff...)
for j, ts in enumrate(reps):
    tszip.compress(ts, f"simulation_{j}.trees.tsz")
agladstein commented 5 years ago

oooh... I misunderstood. I thought I was going to be able to put multiple replicates in one file. So, that functionality doesn't exist right? I could just compress, and then make a tar ball?

jeromekelleher commented 5 years ago

oooh... I misunderstood. I thought I was going to be able to put multiple replicates in one file. So, that functionality doesn't exist right? I could just compress, and then make a tar ball?

No, there's no functionality to compress lots of replicates into one file. The way to do that would be to make a tarball of the compressed files (but, unless there thousands of them, is there much point?)

agladstein commented 5 years ago

Just to keep simulation files more organized and reduce the number of files.