Open b54bdd18-0ffa-412a-9e55-4a92cef9e562 opened 11 years ago
When sniffing the dialect of a file created with the csv module with dialect=csv.excel_tab and one of the row contain a quote ("), the delimiter is set to ' ' instead of '\t'.
I had a look at this and have the following remarks.
1) the file csv_sniffing_excel_tab.py no longer works with python 3.3. It now produces the folowing traceback:
Traceback (most recent call last):
File "csv_sniffing_excel_tab.py", line 36, in <module>
create_file()
File "csv_sniffing_excel_tab.py", line 23, in create_file
writer.writerows(test_data)
TypeError: 'str' does not support the buffer interface
2) The problem seems to be in the _guess_quote_and_delimiter method. If you always call _guess_delimiter, the sniffer give the correct result.
3) As far as I understand the problem is the first regular expression: (?P\<delim>[^\w\n"\'])(?P\<space> ?)(?P\<quote>["\']).*?(?P=quote)(?P=delim)
Now if we have a line as the following
273:MVREGR1:ByEuPo:"Baryton ""Euphonium"" populaire"
The delim group will match the space, the space group will match nothing the quote group will match " the non-group pattern will match "Euphonium" followed by the quote group matching " again and the delim group matching the space.
And so we get the wrong delimiter.
I included a patch (against 2.7) that seems to make the test work.
The patch prohibits the delim group to match a space.
I included a patch (against 2.7) that seems to make the test work.
The patch prohibits the delim group to match a space.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-bug']
title = 'csv.Sniffer.snif doesn\'t set up the dialect properly for a csv created with dialect=csv.excel_tab and containing quote (") char'
updated_at =
user = 'https://bugs.python.org/GhislainHivon'
```
bugs.python.org fields:
```python
activity =
actor = 'Antoon.Pardon'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation =
creator = 'GhislainHivon'
dependencies = []
files = ['30001']
hgrepos = []
issue_num = 17829
keywords = []
message_count = 3.0
messages = ['187709', '214800', '215031']
nosy_count = 3.0
nosy_names = ['GhislainHivon', 'Antoon.Pardon', 'dmi.baranov']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue17829'
versions = ['Python 2.7', 'Python 3.2']
```