ytsaurus / ytsaurus-spyt

YTsaurus SPYT provides an integration with Apache Spark
Apache License 2.0
9 stars 4 forks source link

The problem with paths #13

Closed MrSandmanRUS closed 3 weeks ago

MrSandmanRUS commented 1 month ago

At the moment, problems with paths have been found in various scenarios. Here are some examples.

Example 1: df.write.mode(SaveMode.Overwrite).json("//home/some/path/onYT") - This path is not recognized correctly. It is defined as "//some/path/onYT" df.write.mode(SaveMode.Overwrite).json("//home/home/some/path/onYT") - That's how it works, the path is recognized as "//home/some/path/onYT"

Example 2: df.write.mode(SaveMode.Overwrite).yt("yt:/home/some/path/onYT") - This path is not recognized correctly. It is defined as "//some/path/onYT", when we try to overwrite an existing file. df.write.mode(SaveMode.Overwrite).yt("//home/some/path/onYT") - That's how it works, the path is recognized as "//home/some/path/onYT"

Thus, this case has been encountered in several different situations, but it seems that the reason is common.

Expected behavior: Paths are recognized correctly.

Alexvsalexvsalex commented 1 month ago

Hi, For table the scheme is not yt, but ytTable. Examples:

df.write.mode(SaveMode.Overwrite).json("/home/some/path/onYT") # single slash
df.write.mode(SaveMode.Overwrite).json("ytTable:///home/some/path/onYT") # specified scheme
df.write.mode(SaveMode.Overwrite).yt("//home/some/path/onYT") # using .yt will correct slash number automatically
df.write.mode(SaveMode.Overwrite).yt("ytTable:///home/some/path/onYT") # specified scheme
MrSandmanRUS commented 1 month ago

Thanks for the advice, though it looks a bit confusing right now.

Alexvsalexvsalex commented 3 weeks ago

Changes in this commit make both cases in the 2nd example working. Unfortunately, we cannot autocorrect usage .json.