train.py : RuntimeError: invalid argument 1: - Githubissues

zlckanata / DeepGlobe-Road-Extraction-Challenge

D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction

http://openaccess.thecvf.com/content_cvpr_2018_workshops/papers/w4/Zhou_D-LinkNet_LinkNet_With_CVPR_2018_paper.pdf

MIT License

646 stars 195 forks source link

train.py : RuntimeError: invalid argument 1: #2

Closed zxshi closed 6 years ago

zxshi commented 6 years ago

当我运行trian.py的时候出现这个问题，请问北邮大神遇到过吗？需要如何解决？ Traceback (most recent call last): File "ttest.py", line 71, in data_loader_iter = iter(data_loader) File "/home/Software/anaconda3/envs/python-3.5/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 310, in iter return DataLoaderIter(self) File "/home/Software/anaconda3/envs/python-3.5/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 180, in init self._put_indices() File "/home/Software/anaconda3/envs/python-3.5/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 219, in _put_indices indices = next(self.sample_iter, None) File "/home/Software/anaconda3/envs/python-3.5/lib/python3.5/site-packages/torch/utils/data/sampler.py", line 119, in iter for idx in self.sampler: File "/home/Software/anaconda3/envs/python-3.5/lib/python3.5/site-packages/torch/utils/data/sampler.py", line 50, in iter return iter(torch.randperm(len(self.data_source)).long()) RuntimeError: invalid argument 1: must be strictly positive at /opt/conda/conda-bld/pytorch_1512954043090/work/torch/lib/TH/generic/THTensorMath.c:2184

zlckanata commented 6 years ago

应该是数据的问题，建议检查一下 "dataset/train/" 这个文件夹下的数据，如果你把deepglobe的训练数据解压在这里，应该直接就能运行起来。

zxshi commented 6 years ago

好的，谢谢！ 1、我先尝试了使用deepglobe的数据进行训练，成功了。如果用Python3会遇到了这个问题： Traceback (most recent call last): File "train.py", line 44, in for img, mask in data_loader_iter: File "/home/szx/Software/anaconda3/envs/python-3.5/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 210, in next return self._process_next_batch(batch) File "/home/szx/Software/anaconda3/envs/python-3.5/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 230, in _process_next_batch raise batch.exc_type(batch.exc_msg) TypeError: Traceback (most recent call last): File "/home/Software/anaconda3/envs/python-3.5/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 42, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/Software/anaconda3/envs/python-3.5/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 42, in samples = collate_fn([dataset[i] for i in batch_indices]) File "/media/files/DeepGlobe-Road-original/data.py", line 125, in getitem id = self.ids[index] TypeError: 'map' object is not subscriptable 解决办法在data.py中将trainlist转为list，见下面第三行 class ImageFolder(data.Dataset): def init(self, trainlist, root): self.ids = list(trainlist) self.loader = default_loader self.root = root 2、我会继续尝试我自己的数据，等有结果了来回复。

zxshi commented 6 years ago

我的数据中标签图片的位深度为8，deepglobe的是24，这会不会是导致错误的原因？

zlckanata commented 6 years ago

如果是用的自己的数据，命名格式很可能和deepglobe的不一样。从你最开始报的那个错来看，应该是dataloader没有读取到数据列表，也就是train.py的21、22行的imagelist和trainlist都为空，建议你打印len(imagelist)和len(trainlist)，应该都是0。解决方法：建议修改train.py的21、22行，以及data.py的92、93行。 https://github.com/zlkanata/DeepGlobe-Road-Extraction-Challenge/blob/d274cdcb34eb93798f8ec17c7f5a200e70a2b969/train.py#L21-L22 https://github.com/zlkanata/DeepGlobe-Road-Extraction-Challenge/blob/d274cdcb34eb93798f8ec17c7f5a200e70a2b969/data.py#L92-L93

zxshi commented 6 years ago

嗯嗯，谢谢大神。我把图片和标签重命名为deepglobe格式可以运行了。标签位深度8和24没影响，位深度为8的话需要将背景设为0，另一个标签设为255才有效果（最开始我把背景设为0，标签设为1，用38张图像训练，测试结果全黑，用arcgis打开也是全黑）。接下来我准备用大批数据进行实验，期待有好的结果。

zlckanata commented 6 years ago

不客气哈~ 因为deepglobe给的标签中，背景是0，道路是255，所以读取标签后，除以了255做”归一化“。 https://github.com/zlkanata/DeepGlobe-Road-Extraction-Challenge/blob/d274cdcb34eb93798f8ec17c7f5a200e70a2b969/data.py#L111 祝有好的结果！

zxshi commented 6 years ago

训练的时候没有使用验证集吗？我把验证集下载后发现只有sat没有标签，是这样吗？

zlckanata commented 6 years ago

官方没有给验证集的标签，需要将验证集的预测标签提交到官网，官方会给一个分数。

zxshi commented 6 years ago

明白，感谢您的回复！

zxshi commented 6 years ago

请问你知道为什么有的地方检测不到吗？比如上面这种情况，大路检测到了，跨越大路由上延伸到下面的路检测到了一半。问题1是：小路左侧还有一条路，完全检测不到，感觉不合理啊？问题2是：检测到一半的这条路大佬能不能指点一下如何优化啊？感谢大佬回复

zlckanata commented 6 years ago

关于你的问题一：这条道路完全检测不出来，就说明网络以比较高的置信度判断这不是要检测出来的道路，需要查看一下数据集中类似区域是否被标注为道路了。关于你的问题二：检测到一半其实也是置信度不够高（例如在0.4~0.6之间），但是这里暴力二值化了，道路看起来就像是断了一样。可以对概率图使用crf之类的后处理来平滑整体的置信度，这一步需要小心处理；也可以对二值化后的图使用图算法做处理（参见 https://github.com/snakers4/spacenet-three 的后处理部分）。

zxshi commented 6 years ago

非常感谢！

l53ma commented 4 years ago

Hello, I have same question when running Python3 train.py, here is the error:

Traceback (most recent call last): File "train.py", line 39, in num_workers=4) File "/home/ev1-ws4/anaconda3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 176, in init sampler = RandomSampler(dataset) File "/home/ev1-ws4/anaconda3/lib/python3.5/site-packages/torch/utils/data/sampler.py", line 66, in init "value, but got num_samples={}".format(self.num_samples)) ValueError: num_samples should be a positive integer value, but got num_samples=0

I output the length of "imagelist" and "trainlist", which is 6226 and 0, respectively.

imagelist = filter(lambda x: x.find('sat')!=-1, os.listdir(ROOT)) #length: 6226 trainlist = map(lambda x: x[:-8], imagelist) #length: 0 x = list(imagelist) y = list(trainlist) print(len(x), len(y))

Any ideas how to solve this problem? Many thanks.