在windows10用单张3070GPU运行看不到输出的训练记录，也没有抛出bug

Henry-Avery commented 1 year ago

PS D:\Learncode\RFresearch\WiFi-CSI-Sensing-Benchmark-main> python run.py --model MLP --dataset Widar using dataset: Widar using model: MLP 只输出了选择的模型和数据集，尝试了其他的也没有开始训练，是不是一定要在linux环境下呢？

Henry-Avery commented 1 year ago

这是终止后的Traceback：

Traceback (most recent call last): File "D:\Learncode\RFresearch\WiFi-CSI-Sensing-Benchmark-main\run.py", line 92, in if name == "main": File "D:\Learncode\RFresearch\WiFi-CSI-Sensing-Benchmark-main\run.py", line 74, in main

for data in tensor_loader:

File "D:\Users\84909\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 628, in next
data = self._next_data() File "D:\Users\84909\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 671, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "D:\Users\84909\anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 58, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "D:\Users\84909\anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 58, in
data = [self.dataset[idx] for idx in possibly_batched_index] File "D:\Learncode\RFresearch\WiFi-CSI-Sensing-Benchmark-main\dataset.py", line 88, in getitem x = np.genfromtxt(sample_dir, delimiter=',') File "D:\Users\84909\anaconda3\lib\site-packages\numpy\lib\npyio.py", line 1997, in genfromtxt converters = [StringConverter(dtype, locked=True, File "D:\Users\84909\anaconda3\lib\site-packages\numpy\lib\npyio.py", line 1997, in converters = [StringConverter(dtype, locked=True, KeyboardInterrupt

Henry-Avery commented 1 year ago

经过漫长的等待后，出现了很奇怪的结果，我大受震撼，而且虽然GPU显示有启动进程，但是占用率一直很低，磁盘读取倒是一直在20M/s，请问Windows系统可能需要改动那些地方呢？ PS D:\Learncode\RFresearch\WiFi-CSI-Sensing-Benchmark-main> python run.py --model MLP --dataset Widar using dataset: Widar using model: MLP Epoch:1, Accuracy:0.9982,Loss:0.006224212 Epoch:2, Accuracy:1.0000,Loss:0.000000000

zkzzhou commented 1 year ago

Traceback

是的我也是这样请问解决了嘛

Henry-Avery commented 1 year ago

Traceback

是的我也是这样请问解决了嘛

就是慢，要等训练完一个epoch才会有记录，但是结果也很怪，怀疑没有正确读入数据

xyanchen commented 1 year ago

Long training time for an epoch is because Widar dataset is segmented into 546 batches and the approximate training time for an epoch is 3 minutes (with single RTX4090). If you want to visualize the training process, you could add tqdm to run.py.

Because our code are written under Linux system, when extracting the category of a data sample, we split the address with '/'. But in Windows system, you need to change the forward slash to the double backslash in order to extract correct label for each data sample (in dataset.py): e.g. y = self.category[sample_dir.split('/')[-2]] to y = self.category[sample_dir.split('\\')[-2]]

Marsrocky commented 1 year ago

PS D:\Learncode\RFresearch\WiFi-CSI-Sensing-Benchmark-main> python run.py --model MLP --dataset Widar using dataset: Widar using model: MLP 只输出了选择的模型和数据集，尝试了其他的也没有开始训练，是不是一定要在linux环境下呢？

Marsrocky commented 1 year ago

PS D:\Learncode\RFresearch\WiFi-CSI-Sensing-Benchmark-main> python run.py --model MLP --dataset Widar using dataset: Widar using model: MLP 只输出了选择的模型和数据集，尝试了其他的也没有开始训练，是不是一定要在linux环境下呢？

整套代码是基于Linux书写和测试的，Pytorch等各种环境的版本必须按照我们给出的去配置，每个版本的Pytorch、numpy、torchvision等都有很多不同，版本不对时会有很多令人无法理解的bug，也无法debug。

xyanchen / WiFi-CSI-Sensing-Benchmark

在windows10用单张3070GPU运行看不到输出的训练记录，也没有抛出bug #3