Dangerous fault in the behaviour of cwk_path_join

ghost commented 3 years ago

The behaviour of cwk_path_join should be similar to running cd on the first argument, and cd on the last argument directly afterwards.

We can see this works with simple statements, like joining together hello and world: Output of cwk_path_join: hello/world Output of os.path.join from Python: hello/world Output of two cds: hello/world

But, if we try to do something more subtle, like joining together /hello and /world, cwalk fails catastrophically: Output of cwk_path_join: /hello/world (See https://likle.github.io/cwalk/reference/cwk_path_join.html - This flaw is even demonstrated in the documentation!) Output of os.path.join from Python: /world Output of two cds: /world

So, why is this happening? It seems that cwk_path_join doesn't actually "walk" or "separate cd" as it is intended to, but rather word-for-word smashes the two paths together.

Unless there is an alternative function that does this properly, cwk_path_join should NOT operate like this. Every argument to cwk_path_join should operate like a separate, sequenced cd.

If I run cd /hello and then cd /world, I end up in /world, because of the /. I do not end up in /hello/world. If I used /hello and then ./world (or just world), it would be /hello/world, but I did not do this.

Unreliable behaviour of cwk_path_join makes this library completely unusable for many people. I would suggest testing your function against Python's os.path functions.

likle commented 3 years ago

Hi @naphthasl ! Thanks for submitting this issue!

I believe the behaviour you are looking for is provided by cwk_path_get_absolute, which I believe generates an absolute path the way you describe. cwk_path_join is currently designed to actually join the two path segments.

Does that help you, or do you need additional functionality?

ghost commented 3 years ago

Is cwk_path_get_absolute not designed in order to find the absolute path of a relative path? Because that isn't what I'm trying to do. I do not want the output to be an absolute path.

In the vast majority of higher level languages, there is a function for joining two paths by walking through them, as I demonstrated with Python in my initial post.

The problem is that cwk_path_join does not do what every other library does do, and neither does cwk_path_get_absolute.

The important takeaway here is that joining together /hello and /world into /hello/world is not a useful operation. There are no situations where this output would be useful, because it is invalid output. Again, see the way os.path.join works in Python, it's pretty much the standard way of writing a function like this.

cwk_path_get_absolute does what cwk_path_join was supposed to do almost perfectly, except it always returns an absolute path, which is not what I need.

I appreciate that my explanation is kind of difficult to understand, but seriously, take a look at os.path.join. It's the safest (and most sane) way to implement this. Here's a demonstration of what your function SHOULD do, and what both cwk_path_get_absolute and cwk_path_join are not capable of doing:

>>> os.path.join("first", "second")
'first/second'
>>> os.path.join("/first", "/second")
'/second'
>>> os.path.join("hello/../world///foo/hello/world", "../world")
'hello/../world///foo/hello/world/../world'
>>> os.path.join("/test/world", "../")
'/test/world/../'
>>> os.path.join("sane/", "../")
'sane/../'
>>> os.path.join("./sane/", "./fast.txt")
'./sane/./fast.txt'
>>>

You will notice that os.path.join does not automatically remove the ../ and ./. That is not os.path.join's job - it does not exist to simplify your paths, it exists to walk between them. /first and /second do not become /first/second, you walk to /first and then you walk over to /second, and arrive at /second.

So, to summarize - with pretty much every other path library, if you want to walk from one directory to another, you join their paths and then normalize them. You can see how the normalization step works in Python, too:

>>> os.path.normpath('hello/../world///foo/hello/world/../world')
'world/foo/hello/world'
>>> os.path.normpath('/test/world/../')
'/test'
>>>

I'm not really sure where you went wrong here - I think you may have misunderstood the "join" operation as some kind of "normalized concatenation" which is not what a join is supposed to do, because this "normalized concatenation" is not useful for the vast majority of programs, but a "normalized join" or "normalized walk" is extremely useful.

Essentially, you really, really need a function that replicates the output of os.path.join (or equivalent) in other languages and libraries, because as of right now you do not. Something that replicates os.path.normpath(os.path.join(a, b)) would also be quite nice. Please test your functions against these other libraries, it will improve the quality and usability of this library significantly. In its current state, it will break or be unusable under many conditions/demands, and is also differs significantly from other languages and libraries, making it difficult to use.

ghost commented 3 years ago

Okay, cwk_path_get_relative seems like it might be closer to what I need. I'll feed it a bunch of tests to see if it works as expected. If it does, then that's great - I would advise you to rename it though, because it isn't immediately obvious that it does the same thing as path joining everywhere else. Maybe you could have a tip in the documentations comparing your functions to other functions elsewhere?

Note: cwk_path_get_relative still auto-normalizes the path. This is fine for my use-case, but may be problematic for others, as the normalization step ought to be optional as it is with other libraries and their join function(s).

My worry is that people might be using cwk_path_join in order to do what cwk_path_get_relative sort of does. If they are, there are many conditions in which their code could break. Since cwk_path_join isn't really useful for anything outside of the library (in comparison to cwk_path_get_relative), maybe you should make it private or rename it, as it may only confuse people and break their code.

likle commented 3 years ago

Hi @naphthasl, thanks for your detailed response and explanation. I will definitely give this a thought on how I can improve it and at least be more clear in the documentation. I understand the behaviour might be different to other libraries and this could cause confusion.

Please be careful, cwk_path_get_relative generates a relative path and does not do the same as Python's path join!

ghost commented 3 years ago

You're right, neither cwk_path_get_relative and cwk_path_get_absolute work here. Do you have a function whereby testgame/ + shaders = testgame/shaders? (Other than cwk_path_join, of course)

Also, cwk_path_get_absolute does not return a path that is absolute. Yes, the path begins with a /, but it returns /testgame/shaders, which is also highly problematic. Absolute paths start at the filesystem root. I guess it ought to output /home/naphtha/Code/HAZE/testgame/shaders - see how it starts at the actual root?

Since we've been using Python equivalents a lot lately:

>>> os.path.abspath(".")
'/home/naphtha/Code/cwalk/build'

likle commented 3 years ago

cwalk does not actually resolve files on the disk, that's because it is much more portable this way. But cwk_path_get_absolute can actually do more than python's abspath because it accepts a base path. You can simply supply your current working directory as a base path to achieve a similar result though:

#include <stdio.h>
#include <stdlib.h>
#include <cwalk.h>
#ifdef _WIN32
#include <direct.h>
#define getcwd _getcwd
#elif
#include <unistd.h>
#endif

int main(int argc, char *argv[]) {
  char buffer[FILENAME_MAX];
  char cwd[FILENAME_MAX];
  if(!getcwd(cwd, sizeof(cwd))) {
    return EXIT_FAILURE;
  }

  cwk_path_get_absolute(cwd, ".", buffer, sizeof(buffer));
  printf("%s\n", buffer); // should output /home/naphtha/Code/HAZE/testgame/shaders depending on your working directory
  return EXIT_SUCCESS;
}

ghost commented 3 years ago

Ah, I must've misunderstood how your cwk_path_get_absolute works. That's actually pretty useful. I think your solution of combining two get_absolutes and a single get_relative with getcwd() will work as a general purpose solution.

Here's an idea, though. Instead of renaming or rewriting anything, maybe there could be an extra function called cwk_fs_path_join which replicates os.path.join's functionality? Something like this, based on the code you provided earlier...

#ifndef MAX_CWD
#define MAX_CWD 4096
#endif

size_t cwk_fs_path_join(const char *base, const char *path, char *buffer, size_t buffer_size)
{
    char cwd[MAX_CWD];
    if(!getcwd(cwd, MAX_CWD)) return 0;

    cwk_path_get_absolute(cwd, base, buffer, buffer_size);
    cwk_path_get_absolute(buffer, path, buffer, buffer_size);
    return cwk_path_get_relative(cwd, buffer, buffer, buffer_size);
}

That way, you can retain the portability of the existing functions while providing easy to use alternatives for people who are looking to replicate functionality similar to other languages.

likle commented 3 years ago

I had second thoughts about that solution because it will always return a relative path from your working directory. If that works for you that's perfectly fine - however if it is an issue you could also do something more simple like this:

size_t custom_join_paths(const char *path_a, const char *path_b, char *buffer,
  size_t buffer_size)
{
  if (cwk_path_is_absolute(path_b)) {
    return cwk_path_normalize(path_b, buffer, buffer_size);
  } else {
    return cwk_path_join(path_a, path_b, buffer, buffer_size);
  }
}

The output will be normalized though, I hope that won't be an issue for you.

I appreciate your input a lot! I will have to think about what to do relating to the library interface.

ghost commented 3 years ago

Ah - I probably should've said this earlier, but my reasoning behind implicitly using the working directory was because I'd expect a lot of people would feed the output of the function straight into fopen(), where all paths are relative to the working directory or absolute from the filesystem root. I was thinking about how it would be possible to make this library a bit easier to use for people who need to make a lot of f* calls (which is a pretty big use case)

likle / cwalk

Dangerous fault in the behaviour of cwk_path_join #18