loh-tar / cpd

A pure bash script to collect copy jobs and start them only if the target drive is not busy by any other job under supervision of cpd
GNU General Public License v2.0
2 stars 1 forks source link

New synopsis #7

Closed loh-tar closed 6 years ago

loh-tar commented 6 years ago

Because I like my own -R/-m options and -t can be simulated I give this synopsis to discussion. Note the two different main calls: cpd and the new cpj, j for job.

cpj [-p PRIO][-R][-m][-o"what ever options"] SOURCE DEST
cpj [-p PRIO][-R][-m][-o"what ever options"] SOURCE... DESTDIR
cpj [-p PRIO][-R][-m][-f FILE-WITH-LIST][-o"what ever options"] DESTDIR
cpj [-p PRIO][-R][-m][-o"what ever options"] -t DESTDIR SOURCE... 

cpd COMMAND [ARGUMENT...]
Ambrevar commented 6 years ago

Traditionally, a daemon is mirrored by a "client", so cpj would then be cpc.

-R is useless with -f FILE-WITH-LIST.

My proposal in #2 was that allowing for an input file list allows for de-coupling the scheduler (that is, the daemon) and the copy tool (here cp).

Basically the user specifies the program (default to cp), its options (default to whatever you think is sensible) and arbitrary positional arguments which are passed as is to the copy program, so cpc does not need to know where the SOURCE and DEST are positioned.

The resulting call is

PROGRAM OPTIONS ARGS

Examples:

$ cpc -p2 SOURCE DEST
# Run `cp SOURCE DEST` with priority 2.

$ cpc -program mv SOURCE DEST
# Run `mv SOURCE DEST`

$ cpc -ot DEST SOURCES...
# Run `cp -t DEST SOURCES...`

$ cpc -obt DEST SOURCE
# Run `cp -b -t DEST SOURCE

Notice that cpc passes the args to the underlying copy program in the order it received them.

My synopsis proposal:

cpc [-p PRIO][-program PROGRAM][-o OPTIONS] ARGS...
cpc [-p PRIO][-program PROGRAM][-o OPTIONS][-f FILELIST] DEST
cpc [-p PRIO][-program PROGRAM][-o OPTIONS] DEST

cpd COMMAND [ARGS...]

In the third form of cpc, the file list is read from stdin.

As for the file names with newlines in the file list: a structured file list such as JSON would solve the issue. jshon is one such tool that enables shell programs to parse JSON.

Ambrevar commented 6 years ago

Thinking about it, I don't think the -t option can be used in cp since then there is no way to make the distinction between SOURCE and DEST, which is important for tracking progress.

Fix for the above suggestion: the positional ARGS must be:

FILES... DEST

i.e. folders are not allowed. We cannot allow folders otherwise there is no way to keep track of the files recursively but by re-implementing all the GNU cp logic. We are better off leaving this job to tools like find and use the -f option.

loh-tar commented 6 years ago

I just pick one tiny part of a sentence where I think you miss something. For the big rest I need some time to think about.

..so cpc does not need to know where the SOURCE and DEST are positioned.

As long as we only want to start some job you are right, but we have to know at least the DEST. You remember? :-)

Edit: Ah, in the next post you noted this. I didn't see it because of -t confused me and I ignored this

Ambrevar commented 6 years ago

Yes, this is what I meant to fix in the subsequent comment.

loh-tar commented 6 years ago

Any experiences/suggestions how to modify cpd to reflect the intended split into cpd/cpc (or cpj - standards are fine but somehow I dislike cpc)? There are at least two possibilities to do this:

As I write this, I think the latter is better.

Ambrevar commented 6 years ago

I agree, I like the latter better:

Ambrevar commented 6 years ago

By the way, using file lists and removing the -R/-m options will simplify the code significantly.

loh-tar commented 6 years ago

Here is what I'm working on. Please make suggestion how to change some naming or simplify the synopsis listing. Looks a little fat.

This is cpd - The copy daemon (v0.1pre6, Nov 2017)

Usage:
  cpd [option>...] <command> [<argument>...]
  cpd [option>...] newjob <source> <destination>
  cpd [option>...] newjob <source>... <dest-dir>
  cpd [option>...] newjob -t <dest-dir> <source>... 
  cpd [option>...] newjob <dest-dir>                   # Read list of sources from stdin
  cpd [option>...] newjob -f <file> <dest-dir>

Main Options:
  -s                       Run only a simulation, copy nothing
  -v                       Be verbose
  -q                       Be almost quiet

New Job Options:
  -f <file>                Read list of source files from <file>
  -m                       Merge all files in <dest-dir>
  -o <cp-opt>              Add <cp-opt> to the called cp command
  -p <user-prio>           Enqueue job with modified default priority. PRIO=6-<user-prio>
  -r                       Copy recursive

Commands:
  c, cancel <job-id>       Cancel a pending or kill a running job
     status                Print status of the daemon and job processing
     start                 Start the daemon and job processing
     pause                 Pause the daemon and job processing
  h, help [c]              Show less help or when c=l License, c=s Source of cpd
  H                        Show this help
  l, list [c]              List jobs, c=i by ID, or errors c=e, or the job log c=l
     run                   Process jobs or trigger daemon to continue
  p, prio <job-id> <prio>  Change job priority 3-7
  r, resume <job-id>       Resume a job
  s, stop <job-id>         Stop a job
     tidy                  Tidy up all job data
     newjob <arguments>    Enqueue a new job with arguments as shown above

Notes:
  • Calling cpd with any other name than cpd forces to run as if newjob was given
  • The <command> is lazy recognized by any parts of its long name
  • The job priority is not static. Lower numbers have a higher priority
    Jobs started in order of priority and enqueue time. 0-2(active), 8+9(old) are intern used
    New jobs have prio 6 unless -p was given
  • pause stops job processing, if so, or stops the daemon at all, if not
    Adding a new job trigger the daemon to continue job processing, there is no blocking
  • tidy does nothing else than 'rm -rf /tmp/cpd/user-lot'
  • -R,-m are ignored when -o is present
  • Enclose <cp-opt> in quotes if you need more than one option

Examples:
  Enqueue new copy task, assumed you have a symlink cpc->cpd
    cpc -p1  /media/1a/foo *                       # New job with higher PRIO=5
    cpc -p-1 /media/1b/foo *                       # New job with lower PRIO=7
    cpc /media/1c/foo < /path/to/list-of-files
    find * -type f -print | cpc -p2 /media/1d/foo

  Take a look how it is going
    watch -n1 cpd list
    watch -n1 cpd status

Edit: Fix double use of stop Edit2: Simplify synopsis block, but still pudgy

Ambrevar commented 6 years ago

I like where this is going! Much more structured, consistent, simple... Good job!

First suggestions:

  1. Replace H by h a (all). But I'm not sure there should be such an option in the first place: why not printing it all directly, it's not so long anyway (and we can work on making it shorter).

  2. Remove start/pause/run: instead, add the possibility to select multiple jobs in so that cancel, resume and stop can manipulate multiple jobs at once. Syntax could be

    • N
    • N M (jobs N and M)
    • N-M (fjobs from N to M)
    • -M (jobs from first till M)
    • N- (jobs from N till last)
      • (all jobs)
  3. What's the exact purpose of 'tidy'? How is it different from cancel?

  4. I thought there was no more "priority" on the user side? I liked your idea of choosing where to insert jobs in in the job queue.

  5. Do you really want to put a recursive -r job option in? This is an open door to tons of issues. Symlink, cross-device folders, special devices, access rights... Random thought: what about having an option (e.g. as an environment variable) of a "finder" to run to find files? For instance, the default command could be find <root> -inames '*<pattern*' and the user would be free to tweak options like symlinks, depth, etc.

  6. Likewise, the -m option is a tough one. I don't have any suggestion for the moment.

  7. If print the status in the list command, then you can remove the status command.

Edit: Finished job selection syntax

loh-tar commented 6 years ago

Good job!

After so much grumbling I really appreciate this, thanks!

1) ..why not printing it all directly

I like it. I use this with my other projects too. This way the experienced user can quick take a look a some point and the new one becomes more infos.

2) Remove start/pause/run

What? run is old -P, start old -D+, pause old -D-

2) add the possibility to select multiple jobs

Guess like cpc stop 1 3 5 (all 3 jobs stopped). Yes, I mentioned this way somewhere. No problem

3) ...tidy

Äh? Did you oversee the note? cancel do nothing but set status from pending to canceled or kills a running job.

4) I thought there was no more "priority" on the user side?

Yes. This one is only one of my suggestion from somewhere

5) -r... is an open door to tons of issues...

Ehm, well, simple ignoring? :-) I think of it like a "quick and easy" mode.

5) ...environment variable of a "finder" ...

Puh(?) Perhaps. Perhaps not. I don't know

7) print the status in the list command

Yeah, I thought too in that direction for a while. It was initially taken from your Arch post

Ambrevar commented 6 years ago

What? run is old -P, start old -D+, pause old -D-

Yes, but is this useful at all? If we have multiple selection, cpd resume - is the same as cpd run and cpd start, isn't it?

Same comment for tidy: isn't it the same as cpd cancel -

Ehm, well, simple ignoring? :-) I think of it like a "quick and easy" mode.

I don't think that's the kind of issues you can ignore.

loh-tar commented 6 years ago

is the same as cpd run and cpd start, isn't it?

No. run process the jobs in the forgeround, start start the daemon who do almost the same, sure.

resume only resume a previous stopped job

tidy cleans up the tmpDir which is not cleared automatically

cancel ..I can't explain in better. Please read again previous post

I don't think that's the kind of issues you can ignore.

When the new job is build they are some checks done, like is arg is directory. So, with ignoring I mean when the arg is not a file it is not added to the list of to copied files. How to handle symlinks I am not sure, right now they are followed (I think is to term) and copied. Crossing file system is the same. And I think that's how it should work.

Ambrevar commented 6 years ago

My point is that all this "almost the same" can be simplified greatly, can't it?

"start", "run", "resume" can all be merged into one "start": if the daemon is not started, start it. If a job is provided, start it. If the job is paused, resume it. If several/all jobs are provided, start them. There is no ambiguity so you can effectively use only one command for all this.

Same for stop/pause/cancel.

tidy cleans up the tmpDir which is not cleared automatically

Why isn't it automatic?

How to handle symlinks I am not sure, right now they are followed

Do you make any cycle detection? If not, there is your first (out of many) problems: the daemon will hang forever.

loh-tar commented 6 years ago

The need for all these commands are for me clear, but with the new names I am not lucky.

Ambrevar commented 6 years ago
loh-tar commented 6 years ago

Why isn't it automatic?

Then are all logs gone

Do you make any cycle detection?

What? No. Don't think so. There is running a find

the daemon will hang forever.

No, is done by add new job :-) However, should to be avoided.

Regarding your merges: Each are complete different tasks. Sure you can auto start/stop the daemon and some so on, but the functionality is needed anyway.

So you suggest to hide it? What is the benefit? A shorter help text. But less user access.

Ambrevar commented 6 years ago

What? No. Don't think so. There is running a find

Do you mean you run the find command to traverse directories? Then cycle-detection is done for you and everything is fine.

No, is done by add new job :-) However, should to be avoided.

I did not understand this.

So you suggest to hide it? What is the benefit? A shorter help text. But less user access.

Ask yourself the opposite question: what's the opposite of starting the daemon without doing anything? It's a user daemon.

A command to terminate the daemon can be left to pkill cpd.

If resume can resume multiple jobs, possibly all jobs, then there is no need for run, right?

User control remains unchanged.

loh-tar commented 6 years ago

> No, is done by add new job I did not understand this.

The daemon read the data written by add new job, therefore would add new job hang

A command to terminate the daemon can be left to pkill cpd.

With a similar argumentation could you request to remove at least tidy, stop, resume and so on. I don't think that would be useful

To resume a job he must be stop ped before. run does a complete different task. stop send some signal to a running process, resume too, but a different signal. I'm sure you know without to search what these signals are, I need too look

OK, I guess you only request to get rid of the need to start/stop the daemon. As said, could be easy be done by some setting. For my sake as optional out.

How about a config file to get rid of some options needed by every use? I talk about an option like -a to auto start the daemon when add a new job or -q to not to be nerved by to much verbosity. But, If you now say yes I'm not sure if I do it for a 1.0 release

loh-tar commented 6 years ago

Damn, wrong button. Where is the "re-open"? Ahh, there!