Open miasik opened 4 months ago
The word description
is indeed a single token, can be independently confirmed, for example, at https://novelai.net/tokenizer (choose CLIP Tokenizer
)
In this case, it has its own dedicated embedding vector. Theoretically, it may be possible to:
But both ways demand a question: Why for?
For sure you cannot "combine back" a word by its parts if the word was a token by itself (you'll have more luck with synonyms of this word!) Also it is possible to infer a (pretty random) string of tokens that would "mean" something what you want it to mean: https://github.com/YuxinWenRick/hard-prompts-made-easy, but that wasn't related to merging but concatenating.
I presume if we could consider a reasonable amount of raw token merging when calculating hard-prompts – it could enhance their quality. But the main reason to use hard prompts – is to be able to treat them as "text", otherwise we can just optimize an old plain TI embedding!
My reason is the same — civitai automoderator is so paranoidal and continue sending my typical real life pictures to long and slow queue. So I'm looking for a way to disguise the words he finds suspicious.
Synonyms usually work well and using them helps me to learn English but sometimes I need to use exactly suspicious words. As an example "a girl next door" is not about a girl but the phrase is as is and changing "girl" to something else breaks the sense. https://www.dictionary.com/e/slang/the-girl-next-door/
Couldn't you just cheat and edit the prompt in meta info of the file? I'm sure there are png chunk editors out there. (Al least you can pad your prompt with spaces and then replace the string with a needed one inside a hex editor, even if your disguise prompt is longer)
Couldn't you just cheat and edit the prompt in meta info of the file? I'm sure there are png chunk editors out there. (Al least you can pad your prompt with spaces and then replace the string with a needed one inside a hex editor, even if your disguise prompt is longer)
No way! I'm a honorable pirate! ;-) I want to keep my prompts usable for people.
I'm a honorable pirate!
👏❤️
…You can save an embedding of "a girl next door", name it differently and upload as your trained TI.
Ugh, disgusting thought.
…You can save an embedding of "a girl next door", name it differently and upload as your trained TI. ~Ugh, disgusting thought.~
Actually, a great idea, speaking of this phrase!
Huh, if you'll multiply words with near-one constant (0.8-1.2) and merge with something related, multiplied with a low constant – so that the result will be still good but not the same as with original words – it would be pretty hard to prove that the embedding was made by merging and not by training.
Moreover, you can start a real training at a very low learning rate but initialized with your target text!
Let's take a word "description" It is a token "13951"
Is there a way to break the word into parts "des" "crip" "ti" "on" and then combine such as <'des' + 'crip' + 'ti' + 'on'> and have the same sence/token/vector/etc?