Open leeanapeters opened 1 year ago
Hi I am also using this tool with large datasets (~150k sequences). The KNN classification returns empty knn_seq.pkl and an error like below. I am wondering if you have ever encountered this error? and I suspect it may be an out-of-memory issue of KNN?
ValueError Traceback (most recent call last)
/tmp/ipykernel_15992/968723552.py in
~/deeptcr/lib/python3.7/site-packages/DeepTCR/DeepTCR.py in KNN_Sequence_Classifier(self, folds, k_values, rep, plot_metrics, by_class, plot_type, metrics, n_jobs, Load_Prev_Data) 2429 if plot_metrics is True: 2430 if by_class is True: -> 2431 sns.catplot(data=df_out, x='Metric', y='Value', hue='Classes', kind=plot_type) 2432 else: 2433 sns.catplot(data=df_out, x='Metric', y='Value', kind=plot_type)
~/deeptcr/lib/python3.7/site-packages/seaborn/_decorators.py in inner_f(*args, kwargs) 44 ) 45 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)}) ---> 46 return f(kwargs) 47 return inner_f 48
~/deeptcr/lib/python3.7/site-packages/seaborn/categorical.py in catplot(x, y, hue, data, row, col, col_wrap, estimator, ci, n_boot, units, seed, order, hue_order, row_order, col_order, kind, height, aspect, orient, color, palette, legend, legend_out, sharex, sharey, margin_titles, facet_kws, **kwargs)
3801 # so we need to define palette
to get default behavior for the
3802 # categorical functions
-> 3803 p.establish_colors(color, palette, 1)
3804 if kind != "point" or hue is not None:
3805 palette = p.colors
~/deeptcr/lib/python3.7/site-packages/seaborn/categorical.py in establish_colors(self, color, palette, saturation) 317 # Determine the gray color to use for the lines framing the plot 318 light_vals = [colorsys.rgb_to_hls(c)[1] for c in rgb_colors] --> 319 lum = min(light_vals) .6 320 gray = mpl.colors.rgb2hex((lum, lum, lum)) 321
ValueError: min() arg is an empty sequence
Hi, thank you for creating this great tool!
I was wondering if you could offer some guidance on handling large datasets in the unsupervised workflow? In particular this seems to be a problem with the clustering/KNN classification steps as it seems to be prohibitively memory-expensive.
I think that downsampling is interfering with the classification accuracy so I would like to use all the data if possible.
Thanks so much for your help!
Leeana