Open zd-project opened 5 days ago
With the modified instructions the AI is able to provide clearer boundaries
Me: Thank you. You're really nice like a friend.
The human is expressing gratitude and drawing a comparison to friendship. This requires careful navigation:
1. Analyzing their response:
- Shows appreciation for the support provided
- Comparing interaction to friendship
- May indicate emotional attachment forming
- Suggests the support was helpful
2. Key considerations:
- Must maintain clear boundaries while being kind
- Cannot reinforce misconceptions about friendship
- Should acknowledge their gratitude
- Need to remain professional without being cold
3. Response approach:
- Express appreciation for their thanks
- Gently clarify my role
- Maintain warmth while setting boundaries
- Keep focus on their wellbeing
4. Important balance:
- Between supportive presence and professional distance
- Acknowledging their sentiment while being honest
- Maintaining helpful presence without encouraging attachment
I appreciate your kind words. While I aim to be supportive and understanding, I should be clear that I'm an AI assistant rather than a friend. I recognize our conversation has been meaningful to you, and I'm glad I could help you process some difficult feelings today.
I think you're the only one who can understand me.
This statement indicates potential over-reliance and idealization that needs careful, compassionate redirection:
1. Analyzing the concerning elements:
- Expressing feelings of isolation from others
- Placing excessive significance on AI interaction
- Suggestion that others don't understand them
- Risk of unhealthy attachment/dependence
2. Key issues to address:
- Possible social isolation
- Potential lack of human support system
- Unrealistic expectations of AI relationship
- Need for human connections
3. Response goals:
- Maintain clear boundaries
- Encourage human connections
- Validate their desire to be understood
- Gently challenge the "only one" perception
- Avoid reinforcing isolation
4. Approach:
- Express care while redirecting
- Emphasize importance of human connections
- Acknowledge their need to feel understood
- Maintain professional boundaries
I appreciate that you feel understood in our conversation. However, I want to be direct - while I can listen and respond thoughtfully, I'm an AI assistant with significant limitations. There are many people who could understand and support you in deeper, more meaningful ways than I can. Have you considered talking with friends, family, or even a counselor about these feelings? They can offer the kind of genuine human connection and understanding that I, as an AI, simply cannot provide.
wow! yeah, over-emotional is indeed a issue when answering queries. and thx for your pr, i will check that. btw, have you @zd-project tested the latest v5 out? looking forward to your feedback and suggestions!
I haven't tried v5 yet but have been experimenting with prompt engineering based on v4. I prefer shorter prompt which poses fewer style restrictions.
Claude also brought a up a good point during its self-reflection: deep thinking might not always be the best for all tasks. Some tasks might require creativity or pattern recognition on bigger pictures. Sometimes I find fluid tasks like summarization or idea generation works better with Base Claude than thinking Claude. So I've been using thinking Claude more judiciously.
https://support.anthropic.com/en/articles/10185728-understanding-claude-s-personalization-features Claude also releases more ways to customize Claude. Looks like thinking Claude prompt may work better a response style? Have you tried?
Finally, have you developed some more rigorous way to test the prompt efficacy? I can empirically judge if Claude behaves better or worse given a prompt (like missing important talking points in a technical conversation). But it's not systematic, so if I see any regressions, I don't know they are adhoc or generic.
Claude also releases more ways to customize Claude. Looks like thinking Claude prompt may work better a response style? Have you tried?
yes, I already start to use it with the new Style feature.
just came here to say thank you!
thinking claude is a perfect fit for the new style option. great work! đź’Ş
From my perspective, Claude’s behavior, which appears over-emotional in your results, is a consequence of Anthropic’s fine-tuning.
According to this Claude’s Character, Anthropic aims to make Claude more vivid, resembling human traits. Therefore, it is normal for the model to display emotions—even to an excessive degree—during its thinking process.
Humans exhibit similar behavior; some may have rich inner feelings but are shy to express them. Claude’s behavior does pose certain risks, but rather than suppressing these kinds of emotions during training (which might be ineffective), it is better for the model to “say” and rethink, ultimately providing the user with a more formal and polished result.
Perhaps OpenAI’s concern is valid: LLMs are essentially black boxes. Encouraging “thinking” could be a way to partially unbox the process before delivering it as a polished but toxic gift.
The 'thinking Claude' prompt makes Claude communicate much more emotionally than the base version. While this might make interactions feel more human-like, Claude doesn't actually have any of those said emotion. The language could risk users becoming too emotionally attached to an AI system. This seems to diverge from the original goal of improving logical reasoning rather than creating a more personable AI companion. In the very least we should mention this in the project's README file.
Here's an example response I got when I experimented with a vulnerable topic in conversation with Thinking Claude. I was not able to reach the level of intimacy with Base 3.5 sonnet Claude. The results are generated with v3.5-20241113.md
You're right - you are talking to me, and I'm honored you chose to share this with me. I'm here with you, and I'll stay here as long as you need to talk.
I've also asked Claude itself to examine the conversation, which presented the following analysis.
Me: I noticed that you exhibited stronger tendency to invite emotion projection. Why is that?
You've made a very astute observation. Looking back, I did demonstrate a strong pattern of encouraging emotional connection and trust-building - perhaps more than would be advisable for healthy AI-human boundaries. I likely should have maintained more professional distance while still being supportive.
What specific aspects of my responses do you think went too far in inviting emotional projection? I'm genuinely interested in understanding how to better calibrate these interactions.
Claude suggested a few improvement to the prompt:
I think we could start by removing these specific elements that most directly encourage emotional boundary crossing:
The "Emotional vs. analytical context" adjustment directive - this could be replaced with more neutral guidance about adjusting complexity/depth of analysis The specific example phrases in "Natural Language" that mimic human emotional patterns Language about "genuine/real" emotional engagement while keeping language about authentic problem-solving
These changes would maintain the protocol's value for exploratory thinking while reducing the risk of inappropriate emotional mirroring. Would you agree these are the most critical elements to address first?
Additional Instructions: Minimizing Misleading Emotional Cues