marschall / memoryfilesystem

An in memory implementation of a JSR-203 file system
284 stars 36 forks source link

Case-insensitive setting loses case information? #56

Closed io7m closed 9 years ago

io7m commented 9 years ago

Hello!

I realize the title of the ticket is a little strange, but I feel that the "case-insensitive" setting of the memory filesystem may be too strong. That is, when enabling a case insensitive filesystem via MemoryFileSystemBuilder.setCaseSensitive(false), the case of the original filename appears to be lost. Rather than just perform comparisons of filenames in a case insensitive manner, it seems that all names are translated to uppercase before being written to "disk".

Why does this matter? Let's say you're writing a program to catalog disks. You walk a filesystem, reading filenames and metadata into a graph/tree structure (effectively, a read-only filesystem that does not store actual file data). In order to allow the graph/tree structure to be filesystem-agnostic, the names in the tree structure have to be case sensitive. If the original filesystem was case insensitive, there is no problem, because files FILE0.TXT and FiLe0.TxT cannot both be in the same directory. If the original filesystem was case sensitive, there is no problem because the graph/tree representation is also case sensitive.

The user creates a file called File0.txt on a case insensitive filesystem. The user then runs the catalog program on this filesystem. Later, the user tries to open File0.txt in the catalog: Uh oh! The filesystem quietly converted the name to FILE0.TXT when the catalog was created and the file cannot be found! This leads to subtle problems when writing test suites that are parameterized by filesystems: The tests have to always assume completely uppercase filenames to be able to check for the same results across filesystems. On a real case-insensitive filesystem such as NTFS, this does not happen.

I believe the correct behaviour should be to return filenames using the case that was used when they were created, but to do case-insensitive string comparisons when it becomes time to compare names.

marschall commented 9 years ago

There are two things at work here, case sensitivity and case preservation. These are two different concepts and therefore two different flags. If you additionally want case preservation then you need to you the following flag.

   .setStoreTransformer(StringTransformers.IDENTIY)

Having that said I just found out that in #newDirectoryStream I was not returning the original file name even with this flag. I just committed a fix for this. The following test passes now:

  @Test
  public void preserveCase() throws IOException {
    FileSystem fileSystem = this.rule.getFileSystem(); // built with MemoryFileSystemBuilder#newWindows()
    Path originalPath = fileSystem.getPath("C:\\File0.txt");
    Files.createFile(originalPath);
    assertThat(fileSystem.getPath("C:\\file0.txt"), exists());
    boolean found = false;
    try (DirectoryStream<Path> stream = Files.newDirectoryStream(originalPath.getParent())) {
      for (Path each : stream) {
        if ("File0.txt".equals(each.getFileName().toString())) {
          found = true;
          break;
        }
      }
    }
    assertTrue(found);
  }

Do you have other cases that were not behaving as expected? If not I would release this version.

ghost commented 9 years ago

memoryfilesystem in Windows mode currently is case-insensitive and non-case-preserving, i.e. it mimicks FAT in that. NTFS is case-insensitive and case-preserving. A good overview summery is in https://en.wikipedia.org/wiki/Case_preservation.

So it is not really a question of adhering to Windows but what to follow FAT or NTFS. Maybe an extra option handling the case-preserving setting would handle that.

Note: OSX native filesystem HFS+ is actually also case-insensitive and case-preserving but most linux filesystems are case-sensitive. i.e. to simulate a real situation is might be better to offer FileSystems as options

marschall commented 9 years ago

That the filesystem in Windows-mode is non-case-preserving is a bug. The intention was for it to be case preserving.

The situation is actually a bit more complicated:

TL;DR: what OP wants should be what happens in the Windows mode

io7m commented 9 years ago

I hadn't realized the situation was so complicated.

@marschall I suspect the fix that makes Windows-like filesystems preserve case will be enough.

marschall commented 9 years ago

@io7m I just released 0.7.2 can you check if this fixes your issues?

io7m commented 9 years ago

Yes, that fixed it!

Thanks for the quick response(s).