issues
search
nchristensen
/
feintune
Autotune batched einsum loopy programs
MIT License
2
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Forming subkernels is slow on large kernels
#26
nchristensen
opened
10 months ago
0
Implement tensor decomposition based tuning
#25
nchristensen
opened
10 months ago
0
Use the subkernel with the most flops for single batch tests.
#24
nchristensen
opened
11 months ago
0
Kernel execution failures can crash autotuner
#23
nchristensen
opened
11 months ago
2
ytopt tuning with MPI appears to deadlock
#22
nchristensen
opened
11 months ago
1
Higher order kernels sometimes crash
#21
nchristensen
opened
11 months ago
1
Switch from frozendict to immutabledict or immutables.Map
#20
nchristensen
closed
10 months ago
2
os.makedirs doesn't create directories on Crusher execution nodes
#19
nchristensen
opened
1 year ago
0
Timed out kernel execution times can be a multiple of the timeout time
#18
nchristensen
closed
12 months ago
0
Address space of prefetch arrays can't always be determined
#17
nchristensen
opened
1 year ago
1
Allow similar kernels to mutually inform tuning
#16
nchristensen
opened
1 year ago
0
Fix sigma=0 for tuning hyperparameters
#15
nchristensen
opened
1 year ago
0
Einsums can be ordered so batches need fewer prefetches
#14
nchristensen
closed
1 year ago
1
Add generator option for no prefetching
#13
nchristensen
closed
1 year ago
0
Reduce prefetching by putting einsums with the same prefetches in the same subbatches when possible
#12
nchristensen
closed
1 year ago
0
Check for assignment instructions with redundant rhs within same subkernel
#11
nchristensen
opened
1 year ago
0
Temporaries are sometimes redundantly aliased
#10
nchristensen
closed
1 year ago
0
Local memory usage is underestimated on einsum 4 to 2 kernels
#9
nchristensen
closed
1 year ago
0
Implement recomposition of the original batched kernels from tuned subkernels
#8
nchristensen
closed
1 year ago
0
compute_smoothed_char_length_11 has redundant instruction ids
#7
nchristensen
closed
1 year ago
1
Batching kernels with only "for" tags is much faster than with more complicated tags
#6
nchristensen
closed
1 year ago
3
Four einsum test case errors during code generation
#5
nchristensen
closed
1 year ago
1
Code generation of large kernels is too slow for autotuning
#4
nchristensen
opened
1 year ago
2
Kernel is unschedulable with batching when all loops are tagged "for"
#3
nchristensen
closed
1 year ago
1
Aliasing interferes with schedulability
#2
nchristensen
closed
1 year ago
1
Slabs don't work when loops share tags
#1
nchristensen
opened
1 year ago
2