ownthink / KnowledgeGraphData

史上最大规模1.4亿中文知识图谱开源下载
https://www.ownthink.com/
4.93k stars 726 forks source link

_csv.Error: line contains NULL byte #28

Closed hjing100 closed 2 years ago

hjing100 commented 2 years ago
with open('ownthink_v2.csv', 'r', encoding='utf8') as fin:
    reader = csv.reader(fin)
    for index, read in enumerate(reader):

你好,我在运行以上读取代码时,在中间某一行报错_csv.Error

ownthink commented 2 years ago

tyr掉

------------------ Original ------------------ From: hjing100 @.> Date: 周三,1月 26,2022 15:30 To: ownthink/KnowledgeGraphData @.> Cc: Subscribed @.***> Subject: Re: [ownthink/KnowledgeGraphData] _csv.Error: line contains NULL byte (Issue #28)

with open('ownthink_v2.csv', 'r', encoding='utf8') as fin: reader = csv.reader(fin) for index, read in enumerate(reader):
你好,我在运行以上读取代码时,在中间某一行报错_csv.Error

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you are subscribed to this thread.Message ID: @.***>

hjing100 commented 2 years ago

是for index, read in enumerate(reader)这一行报错,用try好像不能继续后边的for循环了? 我希望是continue的效果

ownthink commented 2 years ago

我记得是while true的写法,然后一行一行读取,对一行读取报错的try掉。可以搜索下,时间久了不记得在哪里了。

 

------------------ 原始邮件 ------------------ 发件人: "ownthink/KnowledgeGraphData" @.>; 发送时间: 2022年1月26日(星期三) 下午4:43 @.>; @.**@.>; 主题: Re: [ownthink/KnowledgeGraphData] _csv.Error: line contains NULL byte (Issue #28)

是for index, read in enumerate(reader)这一行报错,用try好像不能继续后边的for循环了? 我希望是continue的效果

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: @.***>

hjing100 commented 2 years ago

好的 谢谢 我查查看

kLiHz commented 2 years ago

我记得是 while true 的写法,然后一行一行读取,对一行读取报错的 try 掉。可以搜索下,时间久了不记得在哪里了

大概是在 9929226 行有一个 \x00 空字节:

>>> with open('D:/temp/ownthink_v2/ownthink_v2.csv', 'r') as f:
...   i = 1
...   while i < 9929228:
...     l = f.readline()
...     if i > 9929225:
...       l
...     i += 1
...
'杂剧石棺,"出土\x00时间",1978年\n'
'杂剧石棺,现存地,河南博物院\n'

我这边是这样解决的, 不知道是否有帮助:

import zipfile
from io import TextIOWrapper
import csv

# https://stackoverflow.com/questions/26942476/reading-csv-zipped-files-in-python
# https://stackoverflow.com/questions/50259792/reading-csv-files-from-zip-archive-with-python-3-x

zippedFileName = 'C:/Users/Henry/Downloads/ownthink_v2.zip'
pwd = 'https://www.ownthink.com/'

with zipfile.ZipFile(zippedFileName) as archive:
    with archive.open('ownthink_v2.csv', pwd=bytes(pwd, encoding='utf-8')) as f:
        with TextIOWrapper(f, 'utf-8') as wrappedF:
            reader = csv.reader(wrappedF)
            linesToRead = 10
            while linesToRead > 0:
                try:
                    row = reader.__next__()
                    print(row)
                except StopIteration:
                    break
                except:
                    print(f'Error at line: {reader.line_num}')
                finally:
                    linesToRead -= 1