kevin940726 / remark-code-import

📝 Populate code blocks from files
https://npm.im/remark-code-import
MIT License
62 stars 11 forks source link

fix: unable to use with unix pipes #8

Open forbesmyester opened 3 years ago

forbesmyester commented 3 years ago

Plugin fails when using input from a pipe instead of (presumably a file). This is because it seems to require a path.

This patch will remove the requirement for a path, but use it when it's there

For reference, a pipeline which fails with previous code is shown below:

const unified = require('unified');
const createStream = require('unified-stream');
const parse = require('remark-parse');
const gfm = require('remark-gfm');
const remark2rehype = require('remark-rehype');
const stringifyRehype = require('rehype-stringify');
const remarkCodeImport = require('remark-code-import');
const raw = require('rehype-raw');

const processor = unified()
  .use(parse)
  .use(gfm)
  .use(remarkCodeImport)
  .use(remark2rehype, {allowDangerousHtml: true})
  .use(raw)
  .use(stringifyRehype);

process.stdin.pipe(createStream(processor)).pipe(process.stdout);
kevin940726 commented 3 years ago

I'm not sure I understand what the use case for remark-code-import is when using it with literal strings or streams? What do you expect the path in the code block meta be pointing at? I guess we can allow absolute paths if they are in the same file system though, but not sure if it's worth it 🤔.

forbesmyester commented 3 years ago

Hi Kevin, thanks for getting back to me.

Markdown is plain text. For me your plugin is only part of a larger process, of which Markdown processing is only one part. It would be non-trivial and, it seems to me, much less elegant to have to write the whole thing in JavaScript as apposed to how it is now, UNIX pipelines.

I don't think a desire to use UNIX pipes is very strange, I'm on a command line in a directory I'd fully expect to be able to cat x.md | node some-js-program-that-wraps-remark-and-your-plugin.js as much as I'd expect to be able to cat orders.csv | awk -F accounting.awk. Nowadays UNIX pipelines are often part of how I solve problems... I admit, that may seem weird to some.

I also think the authors of unified went to the trouble of writing unified-stream for a reason and the authors of remark-cli went to the trouble of adding Pipe to the examples on their page (and making it work) for a reason. I have not checked, but I assume your plugin would break when used this way?

I get that taking the path from the file being processed is an easy answer of how to find a specified filename, but I don't think it's is at all confusing for it to fall back to the current directory (of the shell).

Why would you think about restricting an absolute path to the same filesystem? Are you thinking about this problem from a security perspective I have neglected? I can't see how this helps, but no doubt you've thought about this for longer than me.

I'd be happy to add more logic/tests around the/a fallback if you desire, I'm thinking now that it only falls back (to the current path) if the input is a pipe, but perhaps it should fall back when the file is not found (from the path of the file being processed) also. We could even make the fallbacks as options if desired?

Let me know your thoughts.

Thanks

Matt

kevin940726 commented 3 years ago

Hi Matt,

Thanks for your detailed explanation! I appreciate your feedbacks a lot ♥️.

I'm not saying using streams is unpopular. On the contrary, I think streams are very powerful and I will use them in some situations.

The problem is about plain strings being used as the markdown source of the remark-code-import plugin. remark-code-import expects the markdown source has file meta data, so that we know where the file is, and where to find the referenced file specified in the code block meta. Or else, if the source is just a plain string, then we can't be sure where to find those files. We are then being forced to guess it, unless the path is an absolute path.

We could fallback to the current working directory if a file doesn't have a path, but I feel like it's counter-intuitive and doesn't make sense in most cases. If you really want to create a file from string and assign file path to it, then you should create a vfile instance, which is more explicit.

I believe the problem here is that unified-stream doesn't create the vfile instance for you. They should have enough information to create one, but they chose to only pass down plain string to the processor. Maybe we should fix it there instead?

WDYT? Curious to hear your thoughts :)

forbesmyester commented 3 years ago

Hi Kevin,

You're probably much more familiar with Unified / Remark / certainly your plugin than me, but it does seem that currently there is only one dependency on the VFile, the one that I attempted to change. Of course, maybe you can see ways that you can further develop this project using that VFile data which I can not.

I know that I personally have always expected the following to work (in this order):

I would not have expected the following to work:

The following are in the maybe category:

So I think there, in my head at least, are filename resolution rules that are like the following:

 1) If starts with something other than a `/` it is relative, therefore:
   1) Try and get the file relative to the current file being processed
   2) Try and get the file relative the current directory (of the user)
 2) If starts with a / it is absolute then:
   1) Treat it as an absolute path
   2) It could be from the root of the project, but I would actually first expect it to be relative the root
      filesystem.
 3) If is outside of the source tree, I wouldn't be surprised for it to fail, but I'd be slightly upset as I
    think this is overreach... I think if you're using a library that loads files supplied by users, you
    should be doing the security stuff yourself (even if others also do it).

Thing is, I never think about these things and it was only after many years programming that someone pointed out this ambiguity to me (in PHP) when I realized that it's not necessarily obvious (that it's happening). It is because I think about this so rarely that I think it mostly just works, the way my brain works... but of course I would think that :-) I also see that your solution leads to it just having one rule, which is less ambiguous.

I can see that maybe unified-stream could pass your project a VFile, but I think that's a bit questionable as, what is the filename? I don't think it's right for them to make that jump, but I do think that, for me, as someone creating an application ( if you can call it that ), it is an acceptable solution (as I have more context).

kevin940726 commented 3 years ago

I appreciate your feedbacks a lot! I do!

I want to focus on my main concern here though: treating resolving file system as an opt-in or an opt-out.

If the user are using process or processSync on a file instance, then I think we can safely assume that they are opting-in resolving the file system. Which means that we can resolve the file path on the code block meta for them, and we can expect that there will be no surprise for them.

However, if the user are using a string or stream representation of a markdown source code, we're not sure if the user want to resolve the file system automatically or not. Strings and streams are portable, they can be transferred from a file system to another. We can't be sure if the user wants to access the file system, and even if they do, which file system they want. It's not only a security issue but also an ambiguity issue.

If the user decides to opt-in, it's possible to wrap the string (or stream) inside a vfile instance and pass it to remark. If they don't want this feature, then they probably don't need this plugin either. Throwing an error in the plugin helps the user to spot the potential misconfiguration in their toolchain.

but I think that's a bit questionable as, what is the filename?

It's whatever filename the file has :). If the stream is a readable stream created by fs.createReadStream, then I believe there should be a path property on the stream which unified-stream can access. If the stream is coming from process.stdin or process.stdout that doesn't have a filename, then... create one explicitly! Transform process.stdin into a readable stream with a path property and pass it to unified-stream :).


Your thoughts on the absolute path and the security concerns are very valuable! I think it makes sense to provide an option to specify the root directory or cwd. And we can throw an error if the code is trying to access a file that's out of reach. I'll give it some more thoughts! Thank you!

forbesmyester commented 3 years ago

After reading your comment I went off and checked that there's no VFileSystem, which there doesn't seem to be :-( Whole virtual filesystems seem like a great idea!

I think your thinking is sound, keeping ambiguity low is always a good idea :-) I'd be happy to contribute code to making something like this happen.

My opinions:

I'm sort of in two minds about this, because:

I think my problem is that my mind was somewhere in the middle...

Another consideration, which you rightly have pointed out earlier is, who's job is it to fix...

Something in me feels there should be a stream-to-vfile plugin, which does exactly what I've done now, which is read the entire stream and create a vfile with specified path / filename... maybe it already exists...

If such a thing existed, I feel that it would be natural for remark plugins to document it's use or at least put it as an answer when bugs / pull-requests come in like this one :-)

BTW thanks for engaging so constructively on this, particularly as I came at it from a user/bug/pr perspective.