Closed aleksejrs closed 8 months ago
IsADirectoryError: [Errno 21] Is a directory: '…/fsrs/fsrs4anki.wiki.txt'
I haven't considered files that have double extensions like this. I'll look into it.
@aleksejrs I'm not able to reproduce the problem.
With this file structure:
ja-test-report/
- fsrs/
- fsrs4anki.wiki.txt
- test_ja_1.txt
- test_ja_2.txt
When I select the ja-test-report
directory then I successfully get a report on all three files.
Create a directory name "directory.txt" in a Readability Analyzer corpus directory.
Oh wait, did you give a directory an extension? If that's the case then it's unsupported behavior.
Can't it just check? The directory was a copy of the wiki where I added ".txt" to ".md" files for MorphMan.
Can't it just check?
I mean, in theory it's probably possible. That being said, I have never seen, nor heard of a directory having an extension before, so adding a new check for that would unnecessarily slow down an already slow process imo. Better to just remove the extension.
Btw, that is an amazing QA test, I'll definitely remember it.
It could be something like
try:
with open(input_file, encoding="utf-8") as file:
…
except IsADirectoryError:
continue
So it would not check until the error happens.
@aleksejrs sure, but catching every possible exception that could occur from user errors is not desirable--the code complexity would get unwieldy really fast. In this case I actually think ignoring the error is straight up bad, because here the error would let the user know that they have fundamental problem with their file structure, and then they can fix said structure and prevent future problems that could arise from having malformed directory names.
@mortii If it just skips processing the directory as a file and still processes the files within it, I see no fundamental problem with the file structure.
A Windows user would think a question mark is a problem with the file structure (as Anki does when exporting cards).
What if my directory was from a Moldovan website and ended in .md
?
The directory was a copy of the wiki where I added ".txt" to ".md" files for MorphMan.
@aleksejrs Could you give me the steps you used to copy a wiki that ended up with having a directory with an extension?
@mortii I made a copy of the wiki, renamed the files in it, and renamed the directory to indicate that it contains txt files.
.md or any other TLD in a file name, however, would come from a downloader like httrack or DownThemAll!.
Directories do not have extensions because there is no meaning assigned to an extension in a directory's name, so whatever is at the end of it is just not an extension, but part of a string. Not because there is something wrong with that and it would look like a file (although it would in "ls|less" — that's why nobody uses that to find files to process).
.txt may have been a somewhat silly choice, but if any other dot in a directory file name can trip some program up, am I supposed to never use dots in a directory's name? I use anything possible but a line break or an invisible character or "|" (as long as it is not a tree of configuration files or something). The only programs I wouldn't be surprised to fail to distinguish directories and files are my own quick-and-dirty find
commands, not a program that reads files in a directory of arbitrary books and articles.
MorphMan uses os.walk, which yields a 3-tuple (dirpath, dirnames, filenames):
def accepted_filetype(filename):
return filename.lower().endswith(('.srt', '.ass', '.txt', '.corpusdb'))
list_of_files = list()
for (dirpath, _, filenames) in os.walk(input_path):
list_of_files += [os.path.join(dirpath, filename) for filename in filenames if accepted_filetype(filename)]
https://docs.python.org/3/library/stdtypes.html#str.endswith
@aleksejrs we only need the file names so os.walk does indeed seem like a better choice than glob. I'll try replacing it.
this now works with file structures like: fsrs.MD/fsrs.txt/fsrs.wiki.txt
next release is probably within a couple of days.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Describe the bug A directory with .txt trips RRG up.
To Reproduce Steps to reproduce the behavior:
Desktop (please complete the following information):